I would like to write a function that takes multiple dataframes that have the same structure, does specific transformations and saves the transformations inplace.
Dummy dataframes
df = pd.DataFrame({"Full name" : ["John Doe","Deep Smith","Julia Carter","Kate Newton","Sandy Thompson"],
"Monthly Sales" : [25,30,35,40,45]})
df2 = pd.DataFrame({"Full name" : ["Alicia Williams","Kriten John","Jessica Adams","Isaac Newton","Whitney Gordon"],
"Monthly Sales" : [35,20,50,15,40]})
Transformative function
I don't want to return the dataframe, but rather save those transformations in place.
def tidy_dfs(dfs):
for df in dfs:
# Drop first row
df = df.iloc[1: , :]
# Replace spaces in columns
df.columns = [c.replace(' ', '_') for c in df]
# change cols to lower
df.columns = [c.lower() for c in df]
return df
saving df,df2 = tidy_dfs([df,df2])
of course won't work as we're outside the loop.
Results What would be a way to call this function and save the transformation inplace?
tidy_dfs([df,df2])
CodePudding user response:
EDIT: If pass list of DataFrames, you can return another list (out
) or modify existing list dfs
. So not possible inplace list of DataFrame without assign back like last step.
Your function not return list of DataFrame, so you need create empty list and append
cleaned DataFrame:
def tidy_dfs(dfs):
out = []
for df in dfs:
# Drop first row
df = df.iloc[1: , :]
# Replace spaces in columns
df.columns = [c.replace(' ', '_') for c in df]
# change cols to lower
df.columns = [c.lower() for c in df]
out.append(df)
return out
df,df2 = tidy_dfs([df,df2])
For inplace operations:
def tidy_dfs(dfs):
for df in dfs:
# Drop first row
df.drop(df.index[0], inplace=True)
# Replace spaces in columns and lowercase
df.rename(columns = lambda x: x.replace(' ', '_').lower(), inplace=True)
return dfs
df, df2 = tidy_dfs([df,df2])
CodePudding user response:
The problem is, that you can not reassign the outer variable to the new dataframe address. And pandas tries to avoid such thing as it may be dangerous and always tries to conserve the original dataframe.
It is possible to drop everything and then append new values inplace at the end of the loop. However, this is "ugly" (error-prone and cumbersome)..