I have a df:
id, start_date, end_date
1, 2021-09-01, 2021-09-25
1, 2021-10-01, 2021-10-25
2, 2021-09-01, 2021-09-25
2, 2021-09-15, 2021-10-15
1, 2021-11-01, 2021-11-25
I need to have a running count by ID depending on the two date columns, like so:
id, start_date, end_date, count
1, 2021-09-01, 2021-09-25, 0
1, 2021-10-01, 2021-10-25, 1
2, 2021-09-01, 2021-09-25, 0
2, 2021-09-15, 2021-10-15, 0
1, 2021-11-01, 2021-11-25, 2
The big difficulty I am having is making sure to count rows where the end date in the previous row is less than the start date of the next row. I am counting completed "transactions" that are before a start date.
I haven't tried any code yet because I'm not even sure how to tackle the problem.
CodePudding user response:
IIUC:
inc_count = lambda x: x['start_date'].gt(x['end_date'].shift()).cumsum()
df['count'] = df.groupby('id').apply(inc_count).droplevel('id')
print(df)
# Output:
id start_date end_date count
0 1 2021-09-01 2021-09-25 0
1 1 2021-10-01 2021-10-25 1
2 2 2021-09-01 2021-09-25 0
3 2 2021-09-15 2021-10-15 0
4 1 2021-11-01 2021-11-25 2