Here is what my csv looks like
time | cause |
---|---|
23 | a / b / c |
42 | c / d / a / b |
12 | a / d / e |
98 | c / b / e / d |
and this is the output I am trying to achieve
time | a | b | c | d | e |
---|---|---|---|---|---|
23 | 1 | 1 | 1 | 0 | 0 |
42 | 1 | 1 | 1 | 1 | 0 |
12 | 1 | 1 | 0 | 0 | 1 |
98 | 0 | 1 | 1 | 1 | 1 |
My real data is much larger, but this example should get me what I am looking for. I can not figure out how to use the map function to check for multiple possible values in every cell.
CodePudding user response:
You can use str.get_dummies
and join
back to the original dataframe:
df[['time']].join(df['cause'].str.get_dummies(sep=' / '))
or using pop
for modification of the original dataframe:
df = df.join(df.pop('cause').str.get_dummies(sep=' / '))
output:
time a b c d e
0 23 1 1 1 0 0
1 42 1 1 1 1 0
2 12 1 0 0 1 1
3 98 0 1 1 1 1