Home > Net >  Calculate neighbouring differences from a column in a pandas dataframe after grouping
Calculate neighbouring differences from a column in a pandas dataframe after grouping

Time:04-01

I want to calculate the differences from neighboured numbers in a dataframe column named 'x' after grouping by column 'y' and store the results in a third column 'z'?

Example:

import numpy as np
import pandas as pd

np.random.seed(10)

df = pd.DataFrame(np.random.randint(0, 13, size=13), columns=['x'])
df['y']= ['a','a','a','b','b','b','b','c','c','d','e','e','e']

print(df)

#     x  y
#0    9  a
#1    4  a
#2    0  a
#3    1  b
#4   11  b
#5   12  b
#6    9  b
#7    0  c
#8    1  c
#9   10  d
#10   8  e
#11   9  e
#12   0  e

groups=df.groupby('y')

for name,group in groups:
    print(group['x'].diff().shift(-1))

#0   -5.0
#1   -4.0
#2    NaN
#Name: x, dtype: float64
#3    10.0
#4     1.0
#5    -3.0
#6     NaN
#Name: x, dtype: float64
#7    1.0
#8    NaN
#Name: x, dtype: float64
#9   NaN
#Name: x, dtype: float64
#10    1.0
#11   -9.0
#12    NaN
#Name: x, dtype: float64

The content of the new dataframe should be:

#     x  y    z
#0    9  a -5.0
#1    4  a -4.0 
#2    0  a  NaN
#3    1  b 10.0
#4   11  b  1.0
#5   12  b -3.0
#6    9  b  NaN
#7    0  c  1.0
#8    1  c  NaN
#9   10  d  NaN
#10   8  e  1.0
#11   9  e -9.0
#12   0  e  NaN

My final goal is to produce a histogram of the data in 'z'.

CodePudding user response:

Use:

df['z'] = df.groupby('y')['x'].transform('diff').shift(-1)
print(df)
  • Related