Home > Net >  iterate over a dictionary of dataframes and apply function to each dataframe
iterate over a dictionary of dataframes and apply function to each dataframe

Time:09-07

I have a dictionary of dataframes and the dataframes are basically csv files in a folder.

So I want to apply this function to every dataframe inside the dictionary:

def transformations(df):
    row_1 = list(df.iloc[0])
    row_1_upd = list(filter(None,row_1))
    row_2 = list(df.iloc[1])
    row_2_upd = list(filter(None,row_2))
    new_cols = row_1_upd row_2_upd
    list_df_col = list(df.columns)
    
    for i in range(len(list_df_col)):
        list_df_col[i]=new_cols[i]
    
    df.columns = list_df_col
    df = df.iloc[2:]
    
    return df

I am also able to extract the name of each dataframe that is store in the dictionary:

filenames = os.listdir(r'C:\Users\xxxx\Python_Projects\Analytic_PBI') # lists all csv files in your directory

def extract_name_files(text): # removes .csv from the name of each file
    name_file = text.strip('.csv')
    return name_file

names_of_files = list(map(extract_name_files,filenames)) 
print(names_of_files)

['Analytic_2021_Actual_Budget_FTE', 'Analytic_2021_Actual_Budget_MarginDirectCost', 'Analytic_2021_Actual_Budget_OperationalHqOverheadsCost', 'Analytic_2022_Actual_FTE', 'Analytic_2022_Actual_MarginDirectCost', 'Analytic_2022_Actual_OperationalHqOverheadsCost', 'Analytic_2022_F1_U_FTE', 'Analytic_2022_F1_U_MarginDirectCost', 'Analytic_2022_F1_U_OperationalHqOverheadsCost']

So what I thought to do was loop over the dictionary calling the names of the dataframes using the names_of_files inside of the dictionary dataStorage and apply the function for each dataframe:

for i, df in dataStorage.items():
    df = dataStorage[names_of_files[i]]
    df = transformations(df)

but I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [359], in <cell line: 1>()
      1 for i, df in dataStorage.items():
----> 2     df = dataStorage[names_of_files[i]]
      3     df = transformations(df)

TypeError: list indices must be integers or slices, not str

I know if I apply my function directly like this:

dataStorage["Analytic_2021_Actual_Budget_FTE"] = transformations(dataStorage["Analytic_2021_Actual_Budget_FTE"])

It works because I tested.

I'm probably overcomplicating here, but my knowledge on this is limited

CodePudding user response:

I think you are over-complicating things

for i, df in dataStorage.items():
    df = dataStorage[names_of_files[i]]
    df = transformations(df)

If the goal of this loop is just to access df, then you don't need names_of_files at all. The following should work just as well because you get df from the header of the for loop

for i, df in dataStorage.items():
    #df = dataStorage[names_of_files[i]]
    df = transformations(df)

The error you're seeing is because i is not an index like you'd get from calling enumerate, it's instead the key of the dataStorage dictionary (a string?) and so you can't use it to index into your names_of_files list. You can put print statements in the for-loop to see exactly what i and df are. You shouldn't need to do all this work though.

  • Related