I would like to add the cumulative daily sum of the column ['PR'] by month-year while also grouping by ['PERMNO'].
Here is a snippet of my dataframe:
PERMNO date PR
0 10025 2012-09-17 0
1 10025 2012-09-18 0
2 10025 2012-09-19 0
3 10025 2012-09-20 1
4 10025 2012-09-21 0
... ... ... ...
8567613 93436 2017-12-24 1
8567614 93436 2017-12-26 1
8567615 93436 2017-12-29 0
8567616 93436 2017-12-30 1
8567617 93436 2017-12-31 0
I would like to get something like this:
PERMNO date PR
0 10025 2012-09 10
1 10025 2012-10 6
2 10025 2012-11 4
3 10025 2012-12 20
4 10073 2012-01 5
... ... ... ...
n-4 93436 2017-08 15
n-3 93436 2017-09 17
n-2 93436 2017-10 4
n-1 93436 2017-11 13
n 93436 2017-12 10
I'm mainly using Pandas and NumPy.
Thank you in advance.
CodePudding user response:
Here is the result when summing by PERMNO and year-month, though the expected result in the question does not match the provided data
df['date'] = pd.to_datetime(df['date'])
df.groupby(['PERMNO', df['date'].dt.strftime('%Y-%m') ]).sum().reset_index()
PERMNO date PR
0 10025 2012-09 1
1 93436 2017-12 3