Create a column that contains the sum of rows above within group-CodePudding

Here's the dataset I've got. Basically I would like to create a column containing the sum of the values before the date (which means the sum of the values that is above the row) within the same group. So the first row of each group is supposed to be always 0.

group	date	value
1	10/04/2022	2
1	12/04/2022	3
1	17/04/2022	5
1	22/04/2022	1
2	11/04/2022	3
2	15/04/2022	2
2	17/04/2022	4

The column I want would look like this. Could you give me an idea how to create such a column?

group	date	value	sum
1	10/04/2022	2	0
1	12/04/2022	3	2
1	17/04/2022	5	5
1	22/04/2022	1	10
2	11/04/2022	3	0
2	15/04/2022	2	3
2	17/04/2022	4	5

CodePudding user response：

You can try groupby.transform and call Series.cumsum().shift()

df['sum'] = (df
             # sort the dataframe if needed
             .assign(date=pd.to_datetime(df['date'], dayfirst=True))
             .sort_values(['group', 'date'])
             .groupby('group')['value']
             .transform(lambda col: col.cumsum().shift())
             .fillna(0))

print(df)

   group        date  value   sum
0      1  10/04/2022      2   0.0
1      1  12/04/2022      3   2.0
2      1  17/04/2022      5   5.0
3      1  22/04/2022      1  10.0
4      2  11/04/2022      3   0.0
5      2  15/04/2022      2   3.0
6      2  17/04/2022      4   5.0