I have an exercise in which I need to turn few or several rows into one row if they have the same data in three columnes.
substances = pd.DataFrame({'id': ['id_1', 'id_1', 'id_1', 'id_2', 'id_3'],
'part': ['1', '1', '2', '2', '3'],
'sub': ['paracetamolum', 'paracetamolum', 'ibuprofenum', 'dienogestum', 'etynyloestradiol'],
'strength': ['150', '50', '50', '20', '30'],
'unit' : ['mg', 'mg', 'mg', 'mg', 'mcg'],
'other irrelevant columns for this task' : ['sth1' , 'sth2', 'sth3', 'sth4', 'sth5']
})
Now provided that id, part and substance is the same, I am supposed to make it into one row, so the end result is:
id | part | strength | substance | unit |
---|---|---|---|---|
id_1 | 1 | '150 # 50' | paracetamolum | mg |
id_1 | 2 | 50 | ibuprofenum | mg |
id_2 | 2 | 20 | dienogestum | mg |
id_3 | 3 | 30 | etynyloestradiol | mcg |
The issue I have is that I have problem joining these rows into one row to show possible strength like this '150 # 50' I have tried to something like this, but it is not going great:
substances = substances.groupby('id', 'part', 'sub', 'strength').id.apply(lambda x: str(substances['strength']) ' # ' str(next(substances['strength'])))
CodePudding user response:
df = df.groupby(['id','part','sub','unit']).agg({'strength':' # '.join}).reset_index()
df = df[['id','part','strength', 'sub','unit']]
print(df)
output:
id part strength sub unit
0 id_1 1 150 # 50 paracetamolum mg
1 id_1 2 50 ibuprofenum mg
2 id_2 2 20 dienogestum mg
3 id_3 3 30 etynyloestradiol mcg