Home > Software design >  count number of consecutive dates and group by ID
count number of consecutive dates and group by ID

Time:01-18

I have a data frame about user activity that looks like this:


| ID     |  week| date    | 
|------- |------| ------- |
| 1      | 1    | 20/07/22| 
| 1      | 2    | 28/07/22| 
| 1      | 2    | 30/07/22| 
| 1      | 3    | 04/08/22| 
| 1      | 3    | 05/08/22| 
| 2      | 2    | 26/07/22| 
| 2      | 2    | 27/07/22| 
| 2      | 3    | 04/08/22|

I need to find the sum of consecutive date for each ID for each week, so the output will look like this- so that for each ID a row per week:

| ID     | week | count_consecutive | 
|------- |------| ----------------- |
| 1      | 1    | 0                 | 
| 1      | 2    | 0                 | 
| 1      | 3    | 2                 | 
| 2      | 2    | 2                 | 
| 2      | 3    | 0                 |

Any idea how to do this?

Thanks!!

CodePudding user response:

You can use groupby and apply a custom function:

# You can replace .eq(pd.Timedelta(days=1)) by .eq('1D'), enhanced by @mozway
consecutive_days = lambda x: x.diff().eq(pd.Timedelta(days=1)).sum()   1

out = (df.groupby(['ID', 'week'])['date']
         .apply(consecutive_days)
         .where(lambda x: x > 1, other=0)
         .rename('count_consecutive').reset_index())

print(out)

# Output
   ID  week  count_consecutive
0   1     1                  0
1   1     2                  0
2   1     3                  2
3   2     2                  2
4   2     3                  0

The problem here is the logic you used: 0 if there is no consecutive day but 2 if there one consecutive day. So in the result, it cannot have 1.

  • Related