Different diff operations on different columns-CodePudding

I want to do different diff() manipulation on different columns in a pandas dataframe. Below is an example of using if-statement in a lambda function to take diff(1) on col1 and diff(2) on col2.

data = pd.DataFrame({'col1':[32,42,54,62,76,76,87,98,122,111,132,134,134,156],
                    'col2':[32,58,59,63,65,72,95,100,102,101,232,234,234,256]})

data.apply(lambda x: x.diff(1) if x.name=='col1' else x.diff(2))

I was first thinking about a solution with a dictionary, similar to the agg function. That would be easier when there is more than two columns. Does anyone have some handy methods on how to make different diff() operations on different columns?

CodePudding user response：

If all operation return Series with same size like original column like diff or cumsum is possible use DataFrame.agg:

df = data.agg({'col1':lambda x: x.diff(), 'col2':lambda x: x.diff(2)})
print (df)
    col1   col2
0    NaN    NaN
1   10.0    NaN
2   12.0   27.0
3    8.0    5.0
4   14.0    6.0
5    0.0    9.0
6   11.0   30.0
7   11.0   28.0
8   24.0    7.0
9  -11.0    1.0
10  21.0  130.0
11   2.0  133.0
12   0.0    2.0
13  22.0   22.0

df = data.agg({'col1':lambda x: x.diff(), 'col2':'mean'})
print (df)

ValueError: cannot perform both aggregation and transformation operations simultaneously

CodePudding user response：

One easy option could be to use a dictionary to hold the periods:

periods = {'col1': 1, 'col2': 2}

data.apply(lambda c: c.diff(periods[c.name]))

output:

    col1   col2
0    NaN    NaN
1   10.0    NaN
2   12.0   27.0
3    8.0    5.0
4   14.0    6.0
...