Home > Software engineering >  Pandas - Running total depending on other columns
Pandas - Running total depending on other columns

Time:12-14

I have a df:

id, start_date, end_date
1, 2021-09-01, 2021-09-25
1, 2021-10-01, 2021-10-25
2, 2021-09-01, 2021-09-25
2, 2021-09-15, 2021-10-15
1, 2021-11-01, 2021-11-25

I need to have a running count by ID depending on the two date columns, like so:

id, start_date, end_date, count
1, 2021-09-01, 2021-09-25, 0
1, 2021-10-01, 2021-10-25, 1
2, 2021-09-01, 2021-09-25, 0
2, 2021-09-15, 2021-10-15, 0
1, 2021-11-01, 2021-11-25, 2

The big difficulty I am having is making sure to count rows where the end date in the previous row is less than the start date of the next row. I am counting completed "transactions" that are before a start date.

I haven't tried any code yet because I'm not even sure how to tackle the problem.

CodePudding user response:

IIUC:

inc_count = lambda x: x['start_date'].gt(x['end_date'].shift()).cumsum()
df['count'] = df.groupby('id').apply(inc_count).droplevel('id')
print(df)

# Output:
   id start_date   end_date  count
0   1 2021-09-01 2021-09-25      0
1   1 2021-10-01 2021-10-25      1
2   2 2021-09-01 2021-09-25      0
3   2 2021-09-15 2021-10-15      0
4   1 2021-11-01 2021-11-25      2
  • Related