Calculate date differences, max and min date, and date counts using Pandas dataframes-CodePudding

I have the following dataset, with multiple patient_IDs and treatment Dates per patient_ID:

Patient_ID	Dates
2038	2012-01-02
2038	2012-02-13
2038	2012-02-27
2120	2005-02-05
2120	2009-03-31

What I want to do is:

A) extract per patient the total duration under treatment, then

B) save the maximum and minimum treatment Date (per patient_ID), and

C) finally count the number of treatments (Dates) per patient_ID.

Here is what I tried so far:

df=Patient_Dates
df['Dates'] = pd.to_datetime(df['Dates'])
grouped = df.groupby(['Patient_ID'])
op = []
for Name, group in grouped:
    LengthOfTreatment_days = group['Dates'].max() - group['Dates'].min()
    op.append([Name, LengthOfTreatment_days])

This counts A) in days fine. However, I can't find a way to perform B) and C) above.

Any ideas would be greatly appreciated.

CodePudding user response：

Try with groupby and agg:

output = df.groupby("Patient_ID")["Dates"].agg(max_date="max", min_date="min", num_treatments="count")
output["duration"] = (output["max_date"]-output["min_date"]).dt.days

>>> output
             max_date   min_date  num_treatments  duration
Patient_ID                                                
2038       2012-02-27 2012-01-02               3        56
2120       2009-03-31 2005-02-05               2      1515