Home > OS >  Create a column that contains the sum of rows above within group
Create a column that contains the sum of rows above within group

Time:08-11

Here's the dataset I've got. Basically I would like to create a column containing the sum of the values before the date (which means the sum of the values that is above the row) within the same group. So the first row of each group is supposed to be always 0.

group date value
1 10/04/2022 2
1 12/04/2022 3
1 17/04/2022 5
1 22/04/2022 1
2 11/04/2022 3
2 15/04/2022 2
2 17/04/2022 4

The column I want would look like this. Could you give me an idea how to create such a column?

group date value sum
1 10/04/2022 2 0
1 12/04/2022 3 2
1 17/04/2022 5 5
1 22/04/2022 1 10
2 11/04/2022 3 0
2 15/04/2022 2 3
2 17/04/2022 4 5

CodePudding user response:

You can try groupby.transform and call Series.cumsum().shift()

df['sum'] = (df
             # sort the dataframe if needed
             .assign(date=pd.to_datetime(df['date'], dayfirst=True))
             .sort_values(['group', 'date'])
             .groupby('group')['value']
             .transform(lambda col: col.cumsum().shift())
             .fillna(0))
print(df)

   group        date  value   sum
0      1  10/04/2022      2   0.0
1      1  12/04/2022      3   2.0
2      1  17/04/2022      5   5.0
3      1  22/04/2022      1  10.0
4      2  11/04/2022      3   0.0
5      2  15/04/2022      2   3.0
6      2  17/04/2022      4   5.0
  • Related