I am creating a running count of claims and would like to reset the count to 0 at the start of every month.
I currently have these data frames:
DF1:
Date | X | Y |
---|---|---|
2021-04-25 | 4 | 4 |
2021-04-26 | 0 | 0 |
2021-04-27 | 0 | 0 |
2021-04-28 | 56 | 53 |
2021-04-29 | 0 | 0 |
2021-04-30 | 1 | 0 |
2021-05-01 | 0 | 0 |
2021-05-02 | 5 | 0 |
2021-05-03 | 5 | 5 |
2021-05-04 | 0 | 0 |
Df2:
Date | X | Y |
---|---|---|
2021-04-25 | 4 | 4 |
2021-04-26 | 0 | 0 |
2021-04-27 | 0 | 0 |
2021-04-28 | 56 | 53 |
2021-04-29 | 3 | 0 |
2021-04-30 | 4 | 0 |
2021-05-01 | 4 | 0 |
2021-05-02 | 9 | 0 |
2021-05-03 | 14 | 5 |
2021-05-04 | 9 | 0 |
In order to get to the df2 data frame the following code was used:
df2 = df1
df2['X'] = df1['X'] (df1['X'] - df1['Y']).cumsum().shift(1).fillna(0)
I am unsure how or where in this equation I should add the condition to reset count to 0 at the start of every month such that I have the below dataset.
df_final:
Date | X | Y |
---|---|---|
2021-04-25 | 4 | 4 |
2021-04-26 | 0 | 0 |
2021-04-27 | 0 | 0 |
2021-04-28 | 56 | 53 |
2021-04-29 | 3 | 0 |
2021-04-30 | 4 | 0 |
2021-05-01 | 0 | 0 |
2021-05-02 | 5 | 0 |
2021-05-03 | 10 | 5 |
2021-05-04 | 5 | 0 |
CodePudding user response:
This should work for you:
df = pd.DataFrame([
('2021-04-25', 4, 4),
('2021-04-26', 0, 0),
('2021-04-27', 0, 0),
('2021-04-28', 56, 53),
('2021-04-29', 0, 0),
('2021-04-30', 1, 0),
('2021-05-01', 0, 0),
('2021-05-02', 5, 0),
('2021-05-03', 5, 5),
('2021-05-04', 0, 0),
], columns=['Date', 'X', 'Y',])
df['Date'] = pd.to_datetime(df['Date'])
# Create Cumulative Sums for X
df['X'] = (df['X'] - df['Y'].shift(fill_value=0)).cumsum()
result = pd.concat([
group.assign(X=(
group['X'] - group.loc[group['Date'].dt.is_month_start, 'X'].values[0]
if len(group.loc[group['Date'].dt.is_month_start]) > 0
else group['X']
))
for month, group in df.groupby(df['Date'].dt.month)
])
Output:
Date X Y
0 2021-04-25 4 4
1 2021-04-26 0 0
2 2021-04-27 0 0
3 2021-04-28 56 53
4 2021-04-29 3 0
5 2021-04-30 4 0
6 2021-05-01 0 0
7 2021-05-02 5 0
8 2021-05-03 10 5
9 2021-05-04 5 0
CodePudding user response:
Try:
Groupby(df[Date].dt.Month).agg(‘cumsum’)