and I have the following dataset.
Date | Area | Value | Delivery_Date |
---|---|---|---|
01-01-2021 | ABC | 10 | 02-01-2021 |
02-01-2021 | BCD | 20 | 04-01-2021 |
03-01-2021 | ABC | 15 | 04-01-2021 |
04-01-2021 | BCD | 25 | 05-01-2021 |
05-01-2021 | ABC | 15 | 06-01-2021 |
I have to create a new column named Sum which follows following conditions.
It should take the current row and should look for values from other rows which has same Area and date greater than current row date and delivery date greater than current row "date" as well. It should sum up the values from rows which meet these three conditions. So the resulting table will look like
Date | Area | Value | Delivery_Date | Sum |
---|---|---|---|---|
01-01-2021 | ABC | 10 | 02-01-2021 | 40 |
02-01-2021 | BCD | 20 | 04-01-2021 | 45 |
03-01-2021 | ABC | 15 | 04-01-2021 | 30 |
04-01-2021 | BCD | 25 | 05-01-2021 | 25 |
05-01-2021 | ABC | 15 | 06-01-2021 | 15 |
I am not sure how to start. Can someone help me?
CodePudding user response:
IIUC, use a GroupBy
expanding.sum
after sorting the data on the dates (recent to ancient):
# ensure datetime (although this format could be also sorted as string)
df['Date'] = pd.to_datetime(df['Date'])
df['sum'] = (df
.sort_values(by='Date', ascending=False) # reverse values
.groupby(['Area'])['Value'].expanding().sum() # sum recent values
.droplevel(0)
)
output:
Date Area Value sum
0 2021-01-01 ABC 10 40.0
1 2021-02-01 BCD 20 45.0
2 2021-03-01 ABC 15 30.0
3 2021-04-01 BCD 25 25.0
4 2021-05-01 ABC 15 15.0