Home > database >  Manipulate pandas dataframe with custom function
Manipulate pandas dataframe with custom function

Time:04-12

I am having trouble with applying a customer function to a dataframe. The function works fine and returns the correct dataframe. However, after having it applied my dataframe is still the old one.

My DataFrame is like this:

d = {'col 1' : ['a', 'a', 'a', 'b', 'b', 'b'],'col 2' : [1, 1, 2, 2, 1, 2]}
df = pd.DataFrame(data = d)
df.set_index('col 1')

enter image description here

I wrote a little test function to group by col 1 and find the highest value in col 2 for each group:

def tester(x):
x = x.groupby('col 1', group_keys=False).apply(lambda x: x.nlargest(1, 'col 2'))
return x

then using the function on my dataframe returns the correctly grouped result:

tester(df)

enter image description here

However outside the function the df is still the old one. Why does this happen? This behavior does not occur with lists or dictionaries for instance. How can I continue working with the manipulated df after the function?

Thanks!

CodePudding user response:

You just need to store the return in a new variable or store it in the same to replace

import pandas as pd
d = {'col 1' : ['a', 'a', 'a', 'b', 'b', 'b'],'col 2' : [1, 1, 2, 2, 1, 2]}
df = pd.DataFrame(data = d)
df.set_index('col 1')


def tester(x):
    x = x.groupby('col 1', group_keys=False).apply(lambda x: x.nlargest(1, 'col 2'))
    return x

nw_DF = tester(df) -> New dataframe 
print(str(nw_DF))

CodePudding user response:

Short answer: Once the assignment occurs inside the tester function, the left side of the assignment comes under the local scope of the function.

What this means: When x = ... is assigned, x is now local to the function, thus breaking the link to the DataFrame 'outside'. Sure, x is returned to the caller, yet the caller is not assigning the returned value to anything. So the DataFrame (df) remains unchanged.

The fix is simple. Change this:

tester(df)

To this:

df = tester(df)
  • Related