I need to calculate the difference between each consecutive pairs, but all current solutions such as rolling
, diff
, will not jump.
To explain, I need to get output
column as such
a output
0 5 -3
1 8 -2
2 2 nan
3 4 nan
so (5-8) and (2-4) are my results.
I tried this which doesn't "jump":
df['output'] = df['C'] - df['C'].shift(-1)
CodePudding user response:
I would use numpy for that:
N = 2
df.loc[df.index[:N],
'output'] = -np.diff(df['a'].to_numpy().reshape(N, -1, order='F')))
With pandas:
N = 2
df['output'] = df['a'].iloc[:N].rsub(df['a'].iloc[N:].values)
output:
a output
0 5 -3.0
1 8 -2.0
2 2 NaN
3 4 NaN
Other example with N=3
:
a output
0 5 -3.0
1 8 -4.0
2 7 -6.0
3 2 NaN
4 4 NaN
5 1 NaN
CodePudding user response:
Another way the groupby function is used:
df['output']=df.groupby(df.index // 2).diff(-1)
df['output'].iloc[1:]=df['output'].iloc[1:].shift(-1)
df
'''
a output
0 5 -3.0
1 8 -2.0
2 2 nan
3 4 nan
'''
This is how it works with different data.I'm not sure exactly what you want. If this is wrong, please state it as a comment. I will delete it.
df=pd.DataFrame(data={'a':[5,8,2,4,10,20,10,232,323]})
df['output']=df.groupby(df.index // 2).diff(-1)
df['output'].iloc[1:]=df['output'].iloc[1:].shift(-1)
df
'''
a output
0 5 -3.0
1 8 -2.0
2 2 nan
3 4 -10.0
4 10 nan
5 20 -222.0
6 10 nan
7 232 nan
8 323 nan
'''