I want to calculate the differences from neighboured numbers in a dataframe column named 'x' after grouping by column 'y' and store the results in a third column 'z'?
Example:
import numpy as np
import pandas as pd
np.random.seed(10)
df = pd.DataFrame(np.random.randint(0, 13, size=13), columns=['x'])
df['y']= ['a','a','a','b','b','b','b','c','c','d','e','e','e']
print(df)
# x y
#0 9 a
#1 4 a
#2 0 a
#3 1 b
#4 11 b
#5 12 b
#6 9 b
#7 0 c
#8 1 c
#9 10 d
#10 8 e
#11 9 e
#12 0 e
groups=df.groupby('y')
for name,group in groups:
print(group['x'].diff().shift(-1))
#0 -5.0
#1 -4.0
#2 NaN
#Name: x, dtype: float64
#3 10.0
#4 1.0
#5 -3.0
#6 NaN
#Name: x, dtype: float64
#7 1.0
#8 NaN
#Name: x, dtype: float64
#9 NaN
#Name: x, dtype: float64
#10 1.0
#11 -9.0
#12 NaN
#Name: x, dtype: float64
The content of the new dataframe should be:
# x y z
#0 9 a -5.0
#1 4 a -4.0
#2 0 a NaN
#3 1 b 10.0
#4 11 b 1.0
#5 12 b -3.0
#6 9 b NaN
#7 0 c 1.0
#8 1 c NaN
#9 10 d NaN
#10 8 e 1.0
#11 9 e -9.0
#12 0 e NaN
My final goal is to produce a histogram of the data in 'z'.
CodePudding user response:
Use:
df['z'] = df.groupby('y')['x'].transform('diff').shift(-1)
print(df)