python: structuring complete data-CodePudding

I have the following dataframe:

df = 
id date_diag   date_medication medication_code
1  01-01-2000  03-01-2000      A
2  01-01-2000  02-01-2000      A
3  01-01-2000  04-01-2000      B
4  01-01-2000  05-01-2000      B

I would like to create a table with the count of times a given medication was given after the date of the diagnoses:

df =
medication day1 day2 day3 day4 day5 day6 day7
A           0   1    1    0     0    0    0
B           0   0    0    1     1    0    0

CodePudding user response：

here is one way to do it

# create a temp fields, Seq to count the day of medication
# and days difference b/w medication and diag
# pivot
# add prefix to column
# and do cleanup


out=(df.assign(seq=1, 
           days=(pd.to_datetime(df['date_medication'], dayfirst=True).sub(pd.to_datetime(df['date_diag'], dayfirst=True))).dt.days   1)
 .pivot(index='medication_code', columns='days', values='seq')
 .fillna(0)
 .add_prefix('day')
 .reset_index()
 .rename_axis(columns=None)
)
out

    medication_code     day2    day3    day4    day5
0   A   1.0     1.0     0.0     0.0
1   B   0.0     0.0     1.0     1.0

alternately,

df['days']=pd.to_datetime(df['date_medication'], dayfirst=True).sub(
        pd.to_datetime(df['date_diag'], dayfirst=True)).dt.days   1
out=pd.crosstab(df['medication_code'], df['days']).add_prefix('day').reset_index().rename_axis(columns=None)


out

medication_code     day2    day3    day4    day5
0   A   1   1   0   0
1   B   0   0   1   1