So I have a pandas df as follows and my goal is to take the MATCHUP
column and make it several more dummy columns.
INDICATOR MATCHUP
1 [ "APPLE", "GRAPE" ]
1 [ "APPLE", "GRAPE" ]
0 [ "GRAPE", "BANANA" ]
0 [ "PEAR", "ORANGE" ]
1 [ "ORANGE", "APPLE" ]
Here's a dict of how it looks:
{'INDICATOR': [1, 1, 0, 0, 1],
'MATCHUP': ['[ "APPLE", "GRAPE" ]',
'[ "APPLE", "GRAPE" ]',
'[ "GRAPE", "BANANA" ]',
'[ "PEAR", "ORANGE" ]',
'[ "ORANGE", "APPLE" ]']}
So given this df, I would like to create some dummy variables to identify if a value appears in the MATCHUP
.
Final outcome:
INDICATOR MATCHUP APPLE GRAPE BANANA PEAR ORANGE
1 [ "APPLE", "GRAPE" ] 1 1 0 0 0
1 [ "APPLE", "GRAPE" ] 1 1 0 0 0
0 [ "GRAPE", "BANANA" ] 0 1 1 0 0
0 [ "PEAR", "ORANGE" ] 0 0 0 1 1
1 [ "ORANGE", "APPLE" ] 1 0 0 0 1
Is there a way to accomplish this using pandas? I attempted to accomplish this using this but I think the spacing in the MATCHUP
column make this method unviable.
CodePudding user response:
Check explode
with str.get_dummies
import ast
df = df.join(df['MATCHUP'].map(ast.literal_eval).explode().str.get_dummies().groupby(level=0).sum())