Home > Net >  How to split string column into categorical columns in pandas
How to split string column into categorical columns in pandas

Time:06-13

I have a string column containing something like this

Col1
ind1,ind2,ind3
ind1,ind5,ind3
ind2,ind3,ind5,ind4

I want to split it to the following columns:

ind_1 ind_2 ind_3 ind_4 ind_5
ind1 ind2 ind3
ind1 ind3 ind5
ind2 ind3 ind4 ind5

using .str.rsplit(',', expand=True) does not order the same strings in the same column.

CodePudding user response:

Explode your column then pivot your dataframe:

out = df['Col1'].str.split(',').explode().reset_index()
out = (out.pivot('index', 'Col1', 'Col1').fillna('')
          .rename_axis(index=None, columns=None))
print(out)

# Output
   ind1  ind2  ind3  ind4  ind5
0  ind1  ind2  ind3            
1  ind1        ind3        ind5
2        ind2  ind3  ind4  ind5

CodePudding user response:

use df.column.str.get_dummies with seperator as ","

import pandas as pd

df = pd.DataFrame({
    "col1" : ["ind1,ind2,ind3", "ind1,ind5,ind3", "ind2,ind3,ind5,ind4"]
})
df.head()

# output
    col1
0   ind1,ind2,ind3
1   ind1,ind5,ind3
2   ind2,ind3,ind5,ind4

df  = pd.concat([df,df.col1.str.get_dummies(sep = ",")], axis =1)
df

# output
    col1                ind1    ind2    ind3    ind4    ind5
0   ind1,ind2,ind3      1       1       1       0       0
1   ind1,ind5,ind3      1       0       1       0       1
2   ind2,ind3,ind5,ind4 0       1       1       1       1

CodePudding user response:

I think I found the solution and even return binary result:

df.join(df.Col1.str.get_dummies(',').apply(lambda x: np.where(x == 1, 1, 0))).drop(columns=['Col1'])
  • Related