I am having trouble with applying a customer function to a dataframe. The function works fine and returns the correct dataframe. However, after having it applied my dataframe is still the old one.
My DataFrame is like this:
d = {'col 1' : ['a', 'a', 'a', 'b', 'b', 'b'],'col 2' : [1, 1, 2, 2, 1, 2]}
df = pd.DataFrame(data = d)
df.set_index('col 1')
I wrote a little test function to group by col 1 and find the highest value in col 2 for each group:
def tester(x):
x = x.groupby('col 1', group_keys=False).apply(lambda x: x.nlargest(1, 'col 2'))
return x
then using the function on my dataframe returns the correctly grouped result:
tester(df)
However outside the function the df is still the old one. Why does this happen? This behavior does not occur with lists or dictionaries for instance. How can I continue working with the manipulated df after the function?
Thanks!
CodePudding user response:
You just need to store the return in a new variable or store it in the same to replace
import pandas as pd
d = {'col 1' : ['a', 'a', 'a', 'b', 'b', 'b'],'col 2' : [1, 1, 2, 2, 1, 2]}
df = pd.DataFrame(data = d)
df.set_index('col 1')
def tester(x):
x = x.groupby('col 1', group_keys=False).apply(lambda x: x.nlargest(1, 'col 2'))
return x
nw_DF = tester(df) -> New dataframe
print(str(nw_DF))
CodePudding user response:
Short answer:
Once the assignment occurs inside the tester
function, the left side of the assignment comes under the local scope of the function.
What this means:
When x = ...
is assigned, x
is now local to the function, thus breaking the link to the DataFrame
'outside'. Sure, x
is returned to the caller, yet the caller is not assigning the returned value to anything. So the DataFrame (df
) remains unchanged.
The fix is simple. Change this:
tester(df)
To this:
df = tester(df)