My DataFrame looks like this:
id | date | value |
---|---|---|
1 | 2021-07-16 | 100 |
2 | 2021-09-15 | 20 |
1 | 2021-04-10 | 50 |
1 | 2021-08-27 | 30 |
2 | 2021-07-22 | 15 |
2 | 2021-07-22 | 25 |
1 | 2021-06-30 | 40 |
3 | 2021-10-11 | 150 |
2 | 2021-08-03 | 15 |
1 | 2021-07-02 | 90 |
I want to groupby the id, and return the difference of total value in a 90-days period. Specifically, I want the values of last 90 days based on today, and based on 30 days ago.
For example, considering today is 2021-10-13, I would like to get:
- the sum of all values per id between 2021-10-13 and 2021-07-15
- the sum of all values per id between 2021-09-13 and 2021-06-15
And finally, subtract them to get the variation.
I've already managed to calculate it, by creating separated temporary dataframes containing only the dates in those periods of 90 days, grouping by id, and then merging these temp dataframes into a final one. But I guess it should be an easier or simpler way to do it. Appreciate any help!
Btw, sorry if the explanation was a little messy.
CodePudding user response:
If I understood correctly, you need something like this:
import pandas as pd
import datetime
## Calculation of the dates that we are gonna need.
today = datetime.datetime.now()
delta = datetime.timedelta(days = 120)
# Date of the 120 days ago
hundredTwentyDaysAgo = today - delta
delta = datetime.timedelta(days = 90)
# Date of the 90 days ago
ninetyDaysAgo = today - delta
delta = datetime.timedelta(days = 30)
# Date of the 30 days ago
thirtyDaysAgo = today - delta
## Initializing an example df.
df = pd.DataFrame({"id":[1,2,1,1,2,2,1,3,2,1],
"date": ["2021-07-16", "2021-09-15", "2021-04-10", "2021-08-27", "2021-07-22", "2021-07-22", "2021-06-30", "2021-10-11", "2021-08-03", "2021-07-02"],
"value": [100,20,50,30,15,25,40,150,15,90]})
## Casting date column
df['date'] = pd.to_datetime(df['date']).dt.date
grouped = df.groupby('id')
# Sum of last 90 days per id
ninetySum = grouped.apply(lambda x: x[x['date'] >= ninetyDaysAgo.date()]['value'].sum())
# Sum of last 90 days, starting from 30 days ago per id
hundredTwentySum = grouped.apply(lambda x: x[(x['date'] >= hundredTwentyDaysAgo.date()) & (x['date'] <= thirtyDaysAgo.date())]['value'].sum())
The output is
ninetySum - hundredTwentySum
id
1 -130
2 20
3 150
dtype: int64
You can double check to make sure these are the numbers you wanted by printing ninetySum
and hundredTwentySum
variables.