I'm wondering if there's a better way to split a string column into multiple rows by 'comma' and send the column from long to wide. My current approach is only working to some extent.
data = {'id':["ab3e3", "psdds2", "pas13", "ccdf2", "dsda1"],
'fruit':["apple, organge", "others", "dragon fruit, organge", "watermelon", "others"]}
df = pd.DataFrame(data)
lst_col = 'fruit'
x = df.assign(**{lst_col:df[lst_col].str.split(',')})
dt = x.explode('fruit')
dt['counts'] = 1
dt.pivot(index='id', columns='fruit', values = 'counts')
id fruit
0 ab3e3 apple, organge
1 psdds2 others
2 pas13 dragon fruit, organge
3 ccdf2 watermelon
4 dsda1 others
Expected output:
id organge apple dragon fruit others watermelon
ab3e3 1 1 0 0 0
ccdf2 0 0 0 0 1
dsda1 0 0 0 1 0
pas13 1 0 1 0 0
psdds2 0 0 0 1 0
CodePudding user response:
You can use pd.Series.str.split
and crosstab
:
s = df.assign(fruit=df.fruit.str.split(", ")).explode("fruit")
print (pd.crosstab(s.id, s.fruit))
fruit apple dragon fruit organge others watermelon
id
ab3e3 1 0 1 0 0
ccdf2 0 0 0 0 1
dsda1 0 0 0 1 0
pas13 0 1 1 0 0
psdds2 0 0 0 1 0
CodePudding user response:
Use .str.get_dummies()
to get a dummy table from the fruit
column. Then, join
df
with this dummy table, as follows:
df.drop('fruit', axis=1).join(df['fruit'].str.get_dummies(', '))
Result:
id apple dragon fruit organge others watermelon
0 ab3e3 1 0 1 0 0
1 psdds2 0 0 0 1 0
2 pas13 0 1 1 0 0
3 ccdf2 0 0 0 0 1
4 dsda1 0 0 0 1 0