Let's assume that I have following dataframe:
Index. 0. 1. 2. 3.
A. 10. 10. 10. 10.
B. 20. 20. 20. 20.
C. 30. 30. 30. 30.
D. 40. 40. 40. 40.
E. 50. 50 50. 50.
F. 50. 50 50. 50.
G. 50. 50 50. 50.
If I want to get difference between three rows, following dataframe must be generated:
Index. 0. 1. 2. 3.
A. 10. 10. 10. 10.
D. 30. 30. 30. 30.
G. 10. 10 10. 10.
Currently, I couldn't find APIs for this behavior (diff()
does not behave like this).
Are there any ways to achieve this?
CodePudding user response:
IIUC, you can slice and use diff
:
df2 = df.set_index('Index.').iloc[::3]
out = df2.diff().fillna(df2)
Less efficient alternative if you only want A/D/G, but interesting if you also want B/E, etc.:
df2 = df.set_index('Index.')
df3 = df2.diff(3).fillna(df2)
df3.iloc[::3] # here you can slice differently for other combinations
output:
0. 1. 2. 3.
Index.
A. 10.0 10.0 10.0 10.0
D. 30.0 30.0 30.0 30.0
G. 10.0 10.0 10.0 10.0
CodePudding user response:
diff(3)
will leave the first row null, which means you can just use update
and then slice.
import pandas as pd
df = pd.DataFrame({'1': [10, 20, 30, 40, 50, 50, 50],
'2': [10, 20, 30, 40, 50, 50, 50],
'3': [10, 20, 30, 40, 50, 50, 50],
'4': [10, 20, 30, 40, 50, 50, 50]})
df.update(df.diff(3))
df = df.iloc[::3]
print(df)
Output
1 2 3 4
0 10.0 10.0 10.0 10.0
3 30.0 30.0 30.0 30.0
6 10.0 10.0 10.0 10.0