Got a tricky situation. I tried my best via Pivot or other methods but gave up. Please help if possible.
I like to take a value = 1 from each column and populate the Date in that part. After the above map, the 'Date' field is no more needed. So I am ok to delete that
My sample dataset:
df1 = pd.DataFrame({'Patient': ['John','John','John','Smith','Smith','Smith'],
'Date': [20200101, 20200102, 20200105,20220101, 20220102, 20220105],
'Ibrufen': ['NaN','NaN',1,'NaN','NaN',1],
'Tylenol': [1, 'NaN','NaN',1, 'NaN','NaN'],
})
My desired output:
df2 = pd.DataFrame({'Patient': ['Jonh','Smith'],
'Ibrufen': ['20200105','20220105'],
'Tylenol': ['20200101','20220101'],
'Steroid': ['20200102','20220102'],
})
CodePudding user response:
A possible solution, based on the idea of first creating an auxiliary column containing, for each row, the corresponding medicine:
df1['aux'] = df1.apply(lambda x:
'Ibrufen' if (x['Ibrufen'] == 1) else
'Tylenol' if (x['Tylenol'] == 1) else
'Steroid', axis=1)
(df1.pivot(index='Patient', columns='aux', values='Date')
.reset_index()
.rename_axis(None, axis=1))
Output:
Patient Ibrufen Steroid Tylenol
0 John 20200105 20200102 20200101
1 Smith 20220105 20220102 20220101