Home > Mobile >  Calculate different between rows with pandas.diff()
Calculate different between rows with pandas.diff()

Time:04-26

I'm trying to calculate the difference between each row but I have an issue that I don't know how to solve. Below is my sample code:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Place': ['Hanoi','Hanoi','Hanoi','Hanoi','Hanoi','Hochiminh','Hochiminh','Hochiminh','Hochiminh','Hochiminh'],
    'Date': ['2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05','2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05'],
    'Number': [0,2,4,6,8,12,17,20,26,28]})

And it's the Output:

Place   Date    Number
0   Hanoi   2022-04-01  0
1   Hanoi   2022-04-02  2
2   Hanoi   2022-04-03  4
3   Hanoi   2022-04-04  6
4   Hanoi   2022-04-05  8
5   Hochiminh   2022-04-01  12
6   Hochiminh   2022-04-02  17
7   Hochiminh   2022-04-03  20
8   Hochiminh   2022-04-04  26
9   Hochiminh   2022-04-05  28

And then I used diff to calculate different number:

df['diff'] = df['Number'].diff()

and it's Output:

    Place   Date    Number  diff
0   Hanoi   2022-04-01  0   NaN
1   Hanoi   2022-04-02  2   2.0
2   Hanoi   2022-04-03  4   2.0
3   Hanoi   2022-04-04  6   2.0
4   Hanoi   2022-04-05  8   2.0
5   Hochiminh   2022-04-01  12  4.0
6   Hochiminh   2022-04-02  17  5.0
7   Hochiminh   2022-04-03  20  3.0
8   Hochiminh   2022-04-04  26  6.0
9   Hochiminh   2022-04-05  28  2.0

As you see row 5 of column diff is calculating by subtract row 5 of column number and row 4 of column number but it's not what I want. I want row 5 will be reset to NaN or 0 because I want to calculate different number for each place.

This is my expected Output

    Place   Date    Number  diff
0   Hanoi   2022-04-01  0   NaN
1   Hanoi   2022-04-02  2   2.0
2   Hanoi   2022-04-03  4   2.0
3   Hanoi   2022-04-04  6   2.0
4   Hanoi   2022-04-05  8   2.0
5   Hochiminh   2022-04-01  12  0.0
6   Hochiminh   2022-04-02  17  5.0
7   Hochiminh   2022-04-03  20  3.0
8   Hochiminh   2022-04-04  26  6.0
9   Hochiminh   2022-04-05  28  2.0

CodePudding user response:

df['diff'] = df.groupby('Place')['Number'].diff().fillna(0)

df

       Place        Date  Number  diff
0      Hanoi  2022-04-01       0   0.0
1      Hanoi  2022-04-02       2   2.0
2      Hanoi  2022-04-03       4   2.0
3      Hanoi  2022-04-04       6   2.0
4      Hanoi  2022-04-05       8   2.0
5  Hochiminh  2022-04-01      12   0.0
6  Hochiminh  2022-04-02      17   5.0
7  Hochiminh  2022-04-03      20   3.0
8  Hochiminh  2022-04-04      26   6.0
9  Hochiminh  2022-04-05      28   2.0
  • Related