How to convert comma separated values in to multiple columns using python DataFrame, as shown in the figure?
CodePudding user response:
You can split each string base ','
then use pd.get_dummies
:
s = df['col1'].str.split(',')
df_new = pd.get_dummies(s.apply(pd.Series).stack()).groupby(level=0).sum()
print(df_new)
Output:
s1 s2 s3 s4 s5
0 1 1 1 0 0
1 1 1 0 0 0
2 0 1 0 0 0
3 1 0 0 0 0
4 0 0 1 0 0
5 0 0 1 1 0
6 1 0 0 0 1
CodePudding user response:
Use directly str.get_dummies
and join
to the original DataFrame if you want to keep the original column(s):
out = df.join(df['col1'].str.get_dummies(sep=','))
output:
col1 s1 s2 s3 s4 s5
0 s1,s2,s3 1 1 1 0 0
1 s1,s2 1 1 0 0 0
2 s2 0 1 0 0 0
3 s1 1 0 0 0 0
4 s3 0 0 1 0 0
5 s3,s4 0 0 1 1 0
6 s1,s5 1 0 0 0 1