I want to do different diff() manipulation on different columns in a pandas dataframe. Below is an example of using if-statement in a lambda function to take diff(1) on col1 and diff(2) on col2.
data = pd.DataFrame({'col1':[32,42,54,62,76,76,87,98,122,111,132,134,134,156],
'col2':[32,58,59,63,65,72,95,100,102,101,232,234,234,256]})
data.apply(lambda x: x.diff(1) if x.name=='col1' else x.diff(2))
I was first thinking about a solution with a dictionary, similar to the agg function. That would be easier when there is more than two columns. Does anyone have some handy methods on how to make different diff() operations on different columns?
CodePudding user response:
If all operation return Series with same size like original column like diff
or cumsum
is possible use DataFrame.agg
:
df = data.agg({'col1':lambda x: x.diff(), 'col2':lambda x: x.diff(2)})
print (df)
col1 col2
0 NaN NaN
1 10.0 NaN
2 12.0 27.0
3 8.0 5.0
4 14.0 6.0
5 0.0 9.0
6 11.0 30.0
7 11.0 28.0
8 24.0 7.0
9 -11.0 1.0
10 21.0 130.0
11 2.0 133.0
12 0.0 2.0
13 22.0 22.0
df = data.agg({'col1':lambda x: x.diff(), 'col2':'mean'})
print (df)
ValueError: cannot perform both aggregation and transformation operations simultaneously
CodePudding user response:
One easy option could be to use a dictionary to hold the periods:
periods = {'col1': 1, 'col2': 2}
data.apply(lambda c: c.diff(periods[c.name]))
output:
col1 col2
0 NaN NaN
1 10.0 NaN
2 12.0 27.0
3 8.0 5.0
4 14.0 6.0
...