I have a string column containing something like this
Col1 |
---|
ind1,ind2,ind3 |
ind1,ind5,ind3 |
ind2,ind3,ind5,ind4 |
I want to split it to the following columns:
ind_1 | ind_2 | ind_3 | ind_4 | ind_5 |
---|---|---|---|---|
ind1 | ind2 | ind3 | ||
ind1 | ind3 | ind5 | ||
ind2 | ind3 | ind4 | ind5 |
using .str.rsplit(',', expand=True)
does not order the same strings in the same column.
CodePudding user response:
Explode your column then pivot your dataframe:
out = df['Col1'].str.split(',').explode().reset_index()
out = (out.pivot('index', 'Col1', 'Col1').fillna('')
.rename_axis(index=None, columns=None))
print(out)
# Output
ind1 ind2 ind3 ind4 ind5
0 ind1 ind2 ind3
1 ind1 ind3 ind5
2 ind2 ind3 ind4 ind5
CodePudding user response:
use df.column.str.get_dummies with seperator as ","
import pandas as pd
df = pd.DataFrame({
"col1" : ["ind1,ind2,ind3", "ind1,ind5,ind3", "ind2,ind3,ind5,ind4"]
})
df.head()
# output
col1
0 ind1,ind2,ind3
1 ind1,ind5,ind3
2 ind2,ind3,ind5,ind4
df = pd.concat([df,df.col1.str.get_dummies(sep = ",")], axis =1)
df
# output
col1 ind1 ind2 ind3 ind4 ind5
0 ind1,ind2,ind3 1 1 1 0 0
1 ind1,ind5,ind3 1 0 1 0 1
2 ind2,ind3,ind5,ind4 0 1 1 1 1
CodePudding user response:
I think I found the solution and even return binary result:
df.join(df.Col1.str.get_dummies(',').apply(lambda x: np.where(x == 1, 1, 0))).drop(columns=['Col1'])