Home > Software design >  Pandas dataframe, check list in column an set value in different column
Pandas dataframe, check list in column an set value in different column

Time:01-18

I need your help for the following task: I have the following dataframe:

test = {'Col1':[2,5],
        'Col2':[5,7],
        'Col_List':[['One','Two','Three','Four','Five'], ['Two', 'Four']],
       'One':[0,0],
       'Two':[0,0],
       'Three':[0,0],
       'Four':[0,0],
       'Five':[0,0],}

df=pd.DataFrame.from_dict(test)
df

which looks like:

Col1 Col2 Col_List One Two Three Four Five
2 5 [One, Two, Three, Four, Five] 0 0 0 0 0
5 7 [Two, Four] 0 0 0 0 0

I need to inspect the list in Col_List and set, depending on which item is in the list, the value of column Col1 in the specific column (One, Two, Three, Four or Five).

Now I would like to have the following result:

Col1 Col2 Col_List One Two Three Four Five
2 5 [One, Two, Three, Four, Five] 2 2 2 2 2
5 7 [Two, Four] 0 5 0 5 0

CodePudding user response:

exploded = df.explode("Col_List")

df.update(pd.get_dummies(exploded["Col_List"])
            .mul(exploded["Col1"], axis="rows")
            .groupby(level=0).sum())
  • explode lists' elements to their own rows
  • get 1-hot representation of "One", "Two" etc.
  • multiply it with the (exploded) "Col1" values
    • 1/0 values will act as a selector
  • then undo the explosion: groupby & sum
  • lastly update the original frame's "One", "Two"... columns with this

to get

>>> df

   Col1  Col2                       Col_List  One  Two  Three  Four  Five
0     2     5  [One, Two, Three, Four, Five]    2    2      2     2     2
1     5     7                    [Two, Four]    0    5      0     5     0

CodePudding user response:

Let's try explode the data, unstack it, then assign back:

s = (df[['Col1', 'Col_List']].explode('Col_List')
   .set_index('Col_List', append=True)['Col1']
   .unstack().fillna(df)
)

df.loc[s.index, s.columns] = s

Output:

   Col1  Col2                       Col_List  One  Two  Three  Four  Five
0     2     5  [One, Two, Three, Four, Five]  2.0  2.0    2.0   2.0   2.0
1     5     7                    [Two, Four]  0.0  5.0    0.0   5.0   0.0

CodePudding user response:

You could use .str.join() and str.get_dummies()

df.assign(**df['Col_List'].str.join('|').str.get_dummies().mul(df['Col1'],axis=0))

Output:

   Col1  Col2                       Col_List  One  Two  Three  Four  Five
0     2     5  [One, Two, Three, Four, Five]    2    2      2     2     2
1     5     7                    [Two, Four]    0    5      0     5     0
  • Related