Home > Software design >  How to get difference between rows and return new dataframe?
How to get difference between rows and return new dataframe?

Time:06-08

Let's assume that I have following dataframe:

Index.  0.  1.  2.  3.
  A.    10. 10. 10. 10.
  B.    20. 20. 20. 20.
  C.    30. 30. 30. 30.
  D.    40. 40. 40. 40.
  E.    50. 50  50. 50.
  F.    50. 50  50. 50.
  G.    50. 50  50. 50.

If I want to get difference between three rows, following dataframe must be generated:

Index.  0.  1.  2.  3.
  A.    10. 10. 10. 10.
  D.    30. 30. 30. 30.
  G.    10. 10  10. 10.

Currently, I couldn't find APIs for this behavior (diff() does not behave like this).

Are there any ways to achieve this?

CodePudding user response:

IIUC, you can slice and use diff:

df2 = df.set_index('Index.').iloc[::3]
out = df2.diff().fillna(df2)

Less efficient alternative if you only want A/D/G, but interesting if you also want B/E, etc.:

df2 = df.set_index('Index.')
df3 = df2.diff(3).fillna(df2)
df3.iloc[::3] # here you can slice differently for other combinations

output:

          0.    1.    2.    3.
Index.                        
A.      10.0  10.0  10.0  10.0
D.      30.0  30.0  30.0  30.0
G.      10.0  10.0  10.0  10.0

CodePudding user response:

diff(3) will leave the first row null, which means you can just use update and then slice.

import pandas as pd

df = pd.DataFrame({'1': [10, 20, 30, 40, 50, 50, 50],
 '2': [10, 20, 30, 40, 50, 50, 50],
 '3': [10, 20, 30, 40, 50, 50, 50],
 '4': [10, 20, 30, 40, 50, 50, 50]})

df.update(df.diff(3))
df = df.iloc[::3]
print(df)

Output

      1     2     3     4
0  10.0  10.0  10.0  10.0
3  30.0  30.0  30.0  30.0
6  10.0  10.0  10.0  10.0
  • Related