I have a pandas dataframe with many columns. One of them is a series. I want to split that column into a set of boolean columns. So, if the value in a row is ['Red','Hot','Summer'], I want 3 columns: Red (having value 1), Hot (having value 1) and Summer (having value 1).
Example:
df = pd.DataFrame({'Owner': ['Bob', 'Jane', 'Amy'],
'Make': ['Ford', 'Ford', 'Jeep'],
'Model': ['Bronco', 'Bronco', 'Wrangler'],
'Sentiment': [['Meh','Red','Dirty'], ['Rusty','Sturdy'], ['Dirty','Red']],
'Max Speed': [80, 150, 69],
'Customer Rating': [90, 50, 91]})
gives us:
Now I want this output: (the True/False could be ones and zeros, too. Just as good).
note: I looked at this post below: Split a Pandas column of lists into multiple columns but that only directly works if your series isn't already part of a DF.
any help appreciated!
CodePudding user response:
Try explode
then corsstab
and join
s = df.Sentiment.explode()
out = df.join(pd.crosstab(s.index,s).astype(bool))
out
Owner Make Model Sentiment ... Meh Red Rusty Sturdy
0 Bob Ford Bronco [Meh, Red, Dirty] ... True True False False
1 Jane Ford Bronco [Rusty, Sturdy] ... False False True True
2 Amy Jeep Wrangler [Dirty, Red] ... False True False False
[3 rows x 11 columns]
CodePudding user response:
Try this:
df = pd.concat([df, pd.get_dummies(df['Sentiment'].explode())], axis=1)
Output:
>>> df
Owner Make Model Sentiment Max Speed Customer Rating AWESOME Dirty LOVE Meh Red Rusty Sturdy
0 Bob Ford Bronco [Meh, Red, Dirty] 80 90 0 0 0 1 0 0 0
0 Bob Ford Bronco [Meh, Red, Dirty] 80 90 0 0 0 0 1 0 0
0 Bob Ford Bronco [Meh, Red, Dirty] 80 90 0 1 0 0 0 0 0
1 Jane Ford Bronco [Rusty, Sturdy] 150 50 0 0 0 0 0 1 0
1 Jane Ford Bronco [Rusty, Sturdy] 150 50 0 0 0 0 0 0 1
2 Amy Jeep Wrangler [LOVE, AWESOME, Red] 69 91 0 0 1 0 0 0 0
2 Amy Jeep Wrangler [LOVE, AWESOME, Red] 69 91 1 0 0 0 0 0 0
2 Amy Jeep Wrangler [LOVE, AWESOME, Red] 69 91 0 0 0 0 1 0 0