Here's the dataset I've got. Basically I would like to create a column containing the sum of the values before the date (which means the sum of the values that is above the row) within the same group. So the first row of each group is supposed to be always 0.
group | date | value |
---|---|---|
1 | 10/04/2022 | 2 |
1 | 12/04/2022 | 3 |
1 | 17/04/2022 | 5 |
1 | 22/04/2022 | 1 |
2 | 11/04/2022 | 3 |
2 | 15/04/2022 | 2 |
2 | 17/04/2022 | 4 |
The column I want would look like this. Could you give me an idea how to create such a column?
group | date | value | sum |
---|---|---|---|
1 | 10/04/2022 | 2 | 0 |
1 | 12/04/2022 | 3 | 2 |
1 | 17/04/2022 | 5 | 5 |
1 | 22/04/2022 | 1 | 10 |
2 | 11/04/2022 | 3 | 0 |
2 | 15/04/2022 | 2 | 3 |
2 | 17/04/2022 | 4 | 5 |
CodePudding user response:
You can try groupby.transform
and call Series.cumsum().shift()
df['sum'] = (df
# sort the dataframe if needed
.assign(date=pd.to_datetime(df['date'], dayfirst=True))
.sort_values(['group', 'date'])
.groupby('group')['value']
.transform(lambda col: col.cumsum().shift())
.fillna(0))
print(df)
group date value sum
0 1 10/04/2022 2 0.0
1 1 12/04/2022 3 2.0
2 1 17/04/2022 5 5.0
3 1 22/04/2022 1 10.0
4 2 11/04/2022 3 0.0
5 2 15/04/2022 2 3.0
6 2 17/04/2022 4 5.0