Home > Enterprise >  Pandas-How I find the lag difference days in treatment for each patient?
Pandas-How I find the lag difference days in treatment for each patient?

Time:12-21

Please help. In my sample, each patient has multiple treatments and each treatment start the specific day. My objective here is to compute the difference in each treatment.

Also, all my patients are in a single column. Once a new patient comes, the lag difference has to be reset.

My current dataset format:

df2 = pd.DataFrame({'patient': ['one', 'one', 'one', 'two','two', 'two'],    
...:                     'treatment_schedule': ['treatment1', 'treatment2', 'treatment3', 'treatment1', 'treatment2', 'treatment3'],        
...:                     'date': ['11/20/2022', '11/22/2022', '11/23/2022', '11/8/2022', '11/9/2022', '11/14/2022']})
 
 df2

My desired dataset format:

df3 = pd.DataFrame({'patient': ['one', 'one', 'one', 'two','two', 'two'],    
...:                     'treatment_schedule': ['treatment1', 'treatment2', 'treatment3', 'treatment1', 'treatment2', 'treatment3'],    
...:                     'date': ['11/20/2022', '11/22/2022', '11/23/2022', '11/8/2022', '11/9/2022', '11/14/2022'],   
...:                     'lag_diff_days_between_each_treatment':[0, 2, 1, 0, 1, 5]})     ##### If no prior values for patient one, then like to see either null or zero
 
 df3 

CodePudding user response:

s = (pd.to_datetime(df2['date']).groupby(df2['patient']).diff(1)
     .div(pd.Timedelta('1day')).fillna(0).astype('int'))
df3 = df2.assign(lag_diff_days_between_each_treatment=s)

df3

    patient treatment_schedule  date     lag_diff_days_between_each_treatment
0   one     treatment1          11/20/2022  0
1   one     treatment2          11/22/2022  2
2   one     treatment3          11/23/2022  1
3   two     treatment1          11/8/2022   0
4   two     treatment2          11/9/2022   1
5   two     treatment3          11/14/2022  5

CodePudding user response:

Use DataFrameGroupBy.diff with convert timedeltas to days bySeries.dt.days and replace missing values to 0 by Series.fillna:

df2['date'] = pd.to_datetime(df2['date'])

df2['lag_diff_days_between_each_treatment'] = (df2.groupby('patient')['date']
                                                  .diff()
                                                  .dt.days
                                                  .fillna(0, downcast='int'))
print (df2)
  patient treatment_schedule       date  lag_diff_days_between_each_treatment
0     one         treatment1 2022-11-20                                     0
1     one         treatment2 2022-11-22                                     2
2     one         treatment3 2022-11-23                                     1
3     two         treatment1 2022-11-08                                     0
4     two         treatment2 2022-11-09                                     1
5     two         treatment3 2022-11-14                                     5
     
  • Related