Home > Software design >  Pandas : Create new column based on text values of other columns
Pandas : Create new column based on text values of other columns

Time:05-14

My dataframe looks like this:

    id          text                    labels
0   447         glutamine synthetase    [protein]
1   447         GS                      [protein]
2   447         hepatoma                [indication]
3   447         NaN                      NaN
4   442         Metachromatic           [indication]

I want to transform the dataframe and create two new columns named proteins and indications that contain the text when labels is protein or indication for the same id.

Wanted output

    id          protein                     indication
0   447         glutamine synthetase, GS    hepatoma
0   442         NaN                         Metachromatic

Can someone help how to do this?

CodePudding user response:

Use df.explode with Groupby.agg and df.pivot:

In [417]: out = df.explode('labels').groupby(['id', 'labels'])['text'].agg(','.join).reset_index().pivot('id', 'labels').reset_index().droplevel(0, axis=1).rename_axis(None, axis=1)

In [423]: out.columns = ['id', 'indication', 'protein']

In [424]: out
Out[424]: 
    id     indication                  protein
0  442  Metachromatic                      NaN
1  447       hepatoma  glutamine synthetase,GS
  • Related