Home > other >  How can i transform the dataset using pivot table?
How can i transform the dataset using pivot table?

Time:08-25

I want to transform my dataset such way:

Input:

id drug1_drug drug2_drug drug3_drug age
1 0 1 0 33
1 0 1 0 33
1 0 1 0 33
2 0 1 1 45
2 0 1 1 45
3 1 1 0 66
3 1 1 0 66
3 1 1 0 66
4 1 0 1 28
4 1 0 1 28
5 1 0 0 87
5 1 0 0 87

Output:

id age therapy
1 33 drug2
2 45 drug2,drug3
3 66 drug1,drug2
4 28 drug1,drug3
5 87 drug1

Column names with _drug are constant for a particular id. And i need if it '1' put the name of the drug to the new table. Iam not so good at pandas, but i understand that we need to use pivot_table, but in what way i really dont know. I would be very grateful for answering my questions!

CodePudding user response:

IIUC, use:

(df.set_index(['id', 'age'])
   .rename(columns=lambda x: x.replace('_drug', ''))
   .replace(0, float('nan'))
   .stack()
   .reset_index()
   .groupby(['id', 'age'], as_index=False)
   .agg(therapy=('level_2', lambda x: ','.join(dict.fromkeys(x))))
)

output:

   id  age      therapy
0   1   33        drug2
1   2   45  drug2,drug3
2   3   66  drug1,drug2
3   4   28  drug1,drug3
4   5   87        drug1

CodePudding user response:

Let's try

out = (df.drop_duplicates(df.columns)
       .melt(id_vars=['id', 'age'], var_name='therapy')
       .query('value == 1')
       .drop(columns='value')
       .pipe(lambda d: d.assign(therapy=d['therapy'].str.rstrip('_drug')))
       .groupby(['id', 'age'], as_index=False)['therapy']
       .agg(','.join))
print(out)

   id  age      therapy
0   1   33        drug2
1   2   45  drug2,drug3
2   3   66  drug1,drug2
3   4   28  drug1,drug3
4   5   87        drug1
  • Related