My dataframe looks like this:
id text labels
0 447 glutamine synthetase [protein]
1 447 GS [protein]
2 447 hepatoma [indication]
3 447 NaN NaN
4 442 Metachromatic [indication]
I want to transform the dataframe and create two new columns named proteins
and indications
that contain the text
when labels
is protein or indication for the same id.
Wanted output
id protein indication
0 447 glutamine synthetase, GS hepatoma
0 442 NaN Metachromatic
Can someone help how to do this?
CodePudding user response:
Use df.explode
with Groupby.agg
and df.pivot
:
In [417]: out = df.explode('labels').groupby(['id', 'labels'])['text'].agg(','.join).reset_index().pivot('id', 'labels').reset_index().droplevel(0, axis=1).rename_axis(None, axis=1)
In [423]: out.columns = ['id', 'indication', 'protein']
In [424]: out
Out[424]:
id indication protein
0 442 Metachromatic NaN
1 447 hepatoma glutamine synthetase,GS