I have a dataframe that has user IDs, which week they've logged on and how many sessions they had during that week. It looks similar to this:
ID | Week | Sessions |
---|---|---|
1 | 1 | 1 |
1 | 2 | 10 |
1 | 3 | 0 |
1 | 4 | 0 |
1 | 5 | 0 |
1 | 6 | 1 |
2 | 1 | 5 |
2 | 2 | 7 |
2 | 3 | 9 |
2 | 4 | 1 |
2 | 5 | 0 |
2 | 6 | 3 |
2 | 7 | 2 |
3 | 1 | 8 |
3 | 2 | 2 |
3 | 3 | 0 |
3 | 4 | 0 |
3 | 5 | 0 |
3 | 6 | 0 |
3 | 7 | 0 |
I am looking to find out if a user has churned. Churn here is if in the last 3 weeks of a user's lifetime, they haven't had any session. In the example above it means that only user 3 churned (as the first user had a session at their last week).
Ideally the dataframe would look like this:
ID | Week | Sessions | Churn Session | Churned_user |
---|---|---|---|---|
3 | 1 | 8 | 0 | 1 |
3 | 2 | 2 | 0 | 1 |
3 | 3 | 0 | 0 | 1 |
3 | 4 | 0 | 0 | 1 |
3 | 5 | 0 | 1 | 1 |
3 | 6 | 0 | 0 | 1 |
3 | 7 | 0 | 0 | 1 |
How can I replicate the above?
Thank you.
CodePudding user response:
You can sum the last 3 session records of each user ID then if the sum is 0 keep the user:
is_churned = lambda x: x.iloc[-3:].sum() == 0
out = df[df.groupby('ID')['Sessions'].transform(is_churned)]
print(out)
# Output
ID Week Sessions
13 3 1 8
14 3 2 2
15 3 3 0
16 3 4 0
17 3 5 0
18 3 6 0
19 3 7 0
CodePudding user response:
This sounds like a case for .groupby(['ID'])['Sessions'].rolling(3).sum() as described in How to perform rolling sum on pandas dataframe with group by for last 365 days only