Home > Net >  Improve splitting and exploding a string column at comma and pivot it from long to wide in Pandas
Improve splitting and exploding a string column at comma and pivot it from long to wide in Pandas

Time:10-18

I'm wondering if there's a better way to split a string column into multiple rows by 'comma' and send the column from long to wide. My current approach is only working to some extent.

data = {'id':["ab3e3", "psdds2", "pas13", "ccdf2", "dsda1"],
        'fruit':["apple, organge", "others", "dragon fruit, organge", "watermelon", "others"]}

df = pd.DataFrame(data)

lst_col = 'fruit'
x = df.assign(**{lst_col:df[lst_col].str.split(',')})
dt = x.explode('fruit')
dt['counts'] = 1
dt.pivot(index='id', columns='fruit', values = 'counts')

       id                  fruit
0   ab3e3         apple, organge
1  psdds2                 others
2   pas13  dragon fruit, organge
3   ccdf2             watermelon
4   dsda1                 others

Expected output:

id        organge  apple  dragon fruit  others  watermelon
ab3e3         1      1             0       0           0 
ccdf2         0      0             0       0           1
dsda1         0      0             0       1           0 
pas13         1      0             1       0           0 
psdds2        0      0             0       1           0 

CodePudding user response:

You can use pd.Series.str.split and crosstab:

s = df.assign(fruit=df.fruit.str.split(", ")).explode("fruit")
print (pd.crosstab(s.id, s.fruit))

fruit   apple  dragon fruit  organge  others  watermelon
id                                                      
ab3e3       1             0        1       0           0
ccdf2       0             0        0       0           1
dsda1       0             0        0       1           0
pas13       0             1        1       0           0
psdds2      0             0        0       1           0

CodePudding user response:

Use .str.get_dummies() to get a dummy table from the fruit column. Then, join df with this dummy table, as follows:

df.drop('fruit', axis=1).join(df['fruit'].str.get_dummies(', '))

Result:

       id  apple  dragon fruit  organge  others  watermelon
0   ab3e3      1             0        1       0           0
1  psdds2      0             0        0       1           0
2   pas13      0             1        1       0           0
3   ccdf2      0             0        0       0           1
4   dsda1      0             0        0       1           0
  • Related