I have a data frame about user activity that looks like this:
| ID | week| date |
|------- |------| ------- |
| 1 | 1 | 20/07/22|
| 1 | 2 | 28/07/22|
| 1 | 2 | 30/07/22|
| 1 | 3 | 04/08/22|
| 1 | 3 | 05/08/22|
| 2 | 2 | 26/07/22|
| 2 | 2 | 27/07/22|
| 2 | 3 | 04/08/22|
I need to find the sum of consecutive date for each ID for each week, so the output will look like this- so that for each ID a row per week:
| ID | week | count_consecutive |
|------- |------| ----------------- |
| 1 | 1 | 0 |
| 1 | 2 | 0 |
| 1 | 3 | 2 |
| 2 | 2 | 2 |
| 2 | 3 | 0 |
Any idea how to do this?
Thanks!!
CodePudding user response:
You can use groupby
and apply
a custom function:
# You can replace .eq(pd.Timedelta(days=1)) by .eq('1D'), enhanced by @mozway
consecutive_days = lambda x: x.diff().eq(pd.Timedelta(days=1)).sum() 1
out = (df.groupby(['ID', 'week'])['date']
.apply(consecutive_days)
.where(lambda x: x > 1, other=0)
.rename('count_consecutive').reset_index())
print(out)
# Output
ID week count_consecutive
0 1 1 0
1 1 2 0
2 1 3 2
3 2 2 2
4 2 3 0
The problem here is the logic you used: 0 if there is no consecutive day but 2 if there one consecutive day. So in the result, it cannot have 1.