Home > database >  Calculate date differences, max and min date, and date counts using Pandas dataframes
Calculate date differences, max and min date, and date counts using Pandas dataframes

Time:11-12

I have the following dataset, with multiple patient_IDs and treatment Dates per patient_ID:

Patient_ID Dates
2038 2012-01-02
2038 2012-02-13
2038 2012-02-27
2120 2005-02-05
2120 2009-03-31

What I want to do is:

A) extract per patient the total duration under treatment, then

B) save the maximum and minimum treatment Date (per patient_ID), and

C) finally count the number of treatments (Dates) per patient_ID.

Here is what I tried so far:

df=Patient_Dates
df['Dates'] = pd.to_datetime(df['Dates'])
grouped = df.groupby(['Patient_ID'])
op = []
for Name, group in grouped:
    LengthOfTreatment_days = group['Dates'].max() - group['Dates'].min()
    op.append([Name, LengthOfTreatment_days])

This counts A) in days fine. However, I can't find a way to perform B) and C) above.

Any ideas would be greatly appreciated.

CodePudding user response:

Try with groupby and agg:

output = df.groupby("Patient_ID")["Dates"].agg(max_date="max", min_date="min", num_treatments="count")
output["duration"] = (output["max_date"]-output["min_date"]).dt.days

>>> output
             max_date   min_date  num_treatments  duration
Patient_ID                                                
2038       2012-02-27 2012-01-02               3        56
2120       2009-03-31 2005-02-05               2      1515
  • Related