Home > other >  How do I find users retention within n_days in pandas?
How do I find users retention within n_days in pandas?

Time:07-31

I have a df that looks like this:

date       | user_id | purchase
2020-01-01 | 1       | 10
2020-10-01 | 1       | 12
2020-15-01 | 1       | 5
2020-11-01 | 2       | 500 ...

Now, I want to add an n_day retention flag for each user_id in my df. The expected output should look like:

date       | user_id | purchase | 3D_retention (did user purchase within next 3 days)
2020-01-01 | 1       | 10       | 0 (because there was no purchase on/before 2020-04-01 after 2020-01-01
2020-10-01 | 1       | 12       | 1 (because there was a purchase on 2020-11-01 which was within 3 days from 2020-10-01
2020-11-01 | 1       | 5        | 0

What is the best way of doing this in pandas?

CodePudding user response:

i modified the date to be as yyyy-mm-dd format

          date  user_id     purchase
0   2020-01-01        1     10
1   2020-01-10        1     12
2   2020-01-15        1     5
3   2020-01-11        2     500


df['date']=pd.to_datetime(df['date'])

next_purchase_days =6
df['retention']=df.groupby('user_id')['date'].transform(lambda x: ((x.shift(-1) - x).dt.days< next_purchase_days).astype(int) ) 
df 
df  
date            user_id     purchase    retention
0   2020-01-01        1           10    0
1   2020-01-10        1           12    1
2   2020-01-15        1            5    0
3   2020-01-11        2          500    0
  • Related