I have the following dataset, with multiple patient_IDs and treatment Dates per patient_ID:
Patient_ID | Dates |
---|---|
2038 | 2012-01-02 |
2038 | 2012-02-13 |
2038 | 2012-02-27 |
2120 | 2005-02-05 |
2120 | 2009-03-31 |
What I want to do is:
A) extract per patient the total duration under treatment, then
B) save the maximum and minimum treatment Date (per patient_ID), and
C) finally count the number of treatments (Dates) per patient_ID.
Here is what I tried so far:
df=Patient_Dates
df['Dates'] = pd.to_datetime(df['Dates'])
grouped = df.groupby(['Patient_ID'])
op = []
for Name, group in grouped:
LengthOfTreatment_days = group['Dates'].max() - group['Dates'].min()
op.append([Name, LengthOfTreatment_days])
This counts A) in days fine. However, I can't find a way to perform B) and C) above.
Any ideas would be greatly appreciated.
CodePudding user response:
Try with groupby
and agg
:
output = df.groupby("Patient_ID")["Dates"].agg(max_date="max", min_date="min", num_treatments="count")
output["duration"] = (output["max_date"]-output["min_date"]).dt.days
>>> output
max_date min_date num_treatments duration
Patient_ID
2038 2012-02-27 2012-01-02 3 56
2120 2009-03-31 2005-02-05 2 1515