Making few columns into one column if certain conditions are fulfilled-CodePudding

I have an exercise in which I need to turn few or several rows into one row if they have the same data in three columnes.

substances = pd.DataFrame({'id': ['id_1', 'id_1', 'id_1', 'id_2', 'id_3'],
     'part': ['1', '1', '2', '2', '3'],
     'sub': ['paracetamolum', 'paracetamolum', 'ibuprofenum', 'dienogestum', 'etynyloestradiol'],
     'strength': ['150', '50', '50', '20', '30'],
     'unit' : ['mg', 'mg', 'mg', 'mg', 'mcg'],
     'other irrelevant columns for this task' : ['sth1' , 'sth2', 'sth3', 'sth4', 'sth5']
      })

Now provided that id, part and substance is the same, I am supposed to make it into one row, so the end result is:

id	part	strength	substance	unit
id_1	1	'150 # 50'	paracetamolum	mg
id_1	2	50	ibuprofenum	mg
id_2	2	20	dienogestum	mg
id_3	3	30	etynyloestradiol	mcg

The issue I have is that I have problem joining these rows into one row to show possible strength like this '150 # 50' I have tried to something like this, but it is not going great:

substances = substances.groupby('id', 'part', 'sub', 'strength').id.apply(lambda x: str(substances['strength'])   ' # '   str(next(substances['strength'])))

CodePudding user response：

df = df.groupby(['id','part','sub','unit']).agg({'strength':' # '.join}).reset_index()
df = df[['id','part','strength', 'sub','unit']]
print(df)

output:

     id part  strength               sub unit
0  id_1    1  150 # 50     paracetamolum   mg
1  id_1    2        50       ibuprofenum   mg
2  id_2    2        20       dienogestum   mg
3  id_3    3        30  etynyloestradiol  mcg