I want to transform my dataset such way:
Input:
id | drug1_drug | drug2_drug | drug3_drug | age |
---|---|---|---|---|
1 | 0 | 1 | 0 | 33 |
1 | 0 | 1 | 0 | 33 |
1 | 0 | 1 | 0 | 33 |
2 | 0 | 1 | 1 | 45 |
2 | 0 | 1 | 1 | 45 |
3 | 1 | 1 | 0 | 66 |
3 | 1 | 1 | 0 | 66 |
3 | 1 | 1 | 0 | 66 |
4 | 1 | 0 | 1 | 28 |
4 | 1 | 0 | 1 | 28 |
5 | 1 | 0 | 0 | 87 |
5 | 1 | 0 | 0 | 87 |
Output:
id | age | therapy |
---|---|---|
1 | 33 | drug2 |
2 | 45 | drug2,drug3 |
3 | 66 | drug1,drug2 |
4 | 28 | drug1,drug3 |
5 | 87 | drug1 |
Column names with _drug are constant for a particular id. And i need if it '1' put the name of the drug to the new table. Iam not so good at pandas, but i understand that we need to use pivot_table
, but in what way i really dont know. I would be very grateful for answering my questions!
CodePudding user response:
IIUC, use:
(df.set_index(['id', 'age'])
.rename(columns=lambda x: x.replace('_drug', ''))
.replace(0, float('nan'))
.stack()
.reset_index()
.groupby(['id', 'age'], as_index=False)
.agg(therapy=('level_2', lambda x: ','.join(dict.fromkeys(x))))
)
output:
id age therapy
0 1 33 drug2
1 2 45 drug2,drug3
2 3 66 drug1,drug2
3 4 28 drug1,drug3
4 5 87 drug1
CodePudding user response:
Let's try
out = (df.drop_duplicates(df.columns)
.melt(id_vars=['id', 'age'], var_name='therapy')
.query('value == 1')
.drop(columns='value')
.pipe(lambda d: d.assign(therapy=d['therapy'].str.rstrip('_drug')))
.groupby(['id', 'age'], as_index=False)['therapy']
.agg(','.join))
print(out)
id age therapy
0 1 33 drug2
1 2 45 drug2,drug3
2 3 66 drug1,drug2
3 4 28 drug1,drug3
4 5 87 drug1