I have a dictionary of dataframes and the dataframes are basically csv files in a folder.
So I want to apply this function to every dataframe inside the dictionary:
def transformations(df):
row_1 = list(df.iloc[0])
row_1_upd = list(filter(None,row_1))
row_2 = list(df.iloc[1])
row_2_upd = list(filter(None,row_2))
new_cols = row_1_upd row_2_upd
list_df_col = list(df.columns)
for i in range(len(list_df_col)):
list_df_col[i]=new_cols[i]
df.columns = list_df_col
df = df.iloc[2:]
return df
I am also able to extract the name of each dataframe that is store in the dictionary:
filenames = os.listdir(r'C:\Users\xxxx\Python_Projects\Analytic_PBI') # lists all csv files in your directory
def extract_name_files(text): # removes .csv from the name of each file
name_file = text.strip('.csv')
return name_file
names_of_files = list(map(extract_name_files,filenames))
print(names_of_files)
['Analytic_2021_Actual_Budget_FTE', 'Analytic_2021_Actual_Budget_MarginDirectCost', 'Analytic_2021_Actual_Budget_OperationalHqOverheadsCost', 'Analytic_2022_Actual_FTE', 'Analytic_2022_Actual_MarginDirectCost', 'Analytic_2022_Actual_OperationalHqOverheadsCost', 'Analytic_2022_F1_U_FTE', 'Analytic_2022_F1_U_MarginDirectCost', 'Analytic_2022_F1_U_OperationalHqOverheadsCost']
So what I thought to do was loop over the dictionary calling the names of the dataframes using the names_of_files
inside of the dictionary dataStorage
and apply the function for each dataframe:
for i, df in dataStorage.items():
df = dataStorage[names_of_files[i]]
df = transformations(df)
but I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [359], in <cell line: 1>()
1 for i, df in dataStorage.items():
----> 2 df = dataStorage[names_of_files[i]]
3 df = transformations(df)
TypeError: list indices must be integers or slices, not str
I know if I apply my function directly like this:
dataStorage["Analytic_2021_Actual_Budget_FTE"] = transformations(dataStorage["Analytic_2021_Actual_Budget_FTE"])
It works because I tested.
I'm probably overcomplicating here, but my knowledge on this is limited
CodePudding user response:
I think you are over-complicating things
for i, df in dataStorage.items():
df = dataStorage[names_of_files[i]]
df = transformations(df)
If the goal of this loop is just to access df
, then you don't need names_of_files
at all. The following should work just as well because you get df
from the header of the for
loop
for i, df in dataStorage.items():
#df = dataStorage[names_of_files[i]]
df = transformations(df)
The error you're seeing is because i
is not an index like you'd get from calling enumerate
, it's instead the key of the dataStorage
dictionary (a string?) and so you can't use it to index into your names_of_files
list. You can put print
statements in the for-loop to see exactly what i
and df
are. You shouldn't need to do all this work though.