I'm loading a csv file that has two columns: date
and tags
. tags
contains a list of tags like so:
date,tags
2021-09-08,"#foo, #bar"
2021-09-10,"#bar"
2021-09-15,"#bar, #baz"
2021-09-22,"#bar"
loading it with pandas will result in a data frame where all tags are put into one column like so:
date tags
0 2021-09-08 #foo, #bar
1 2021-09-10 #bar
2 2021-09-15 #bar, #baz
3 2021-09-22 #bar
So, how do I create from this a data frame, a data frame where each tag is separated into their own column:
date foo bar baz
0 2021-09-08 True True False
1 2021-09-10 False True False
2 2021-09-15 False True True
3 2021-09-22 False True False
CodePudding user response:
Use Series.str.get_dummies
with convert 0,1
to boolean and add to date
column by DataFrame.join
:
df = df[['date']].join(df['tags'].str.get_dummies(', ').astype(bool))
print(df)
date #bar #baz #foo
0 2021-09-08 True False True
1 2021-09-10 True False False
2 2021-09-15 True True False
3 2021-09-22 True False False
If need remove #
add rename
with custom function:
f = lambda x: x.lstrip('#')
df = df[['date']].join(df['tags'].str.get_dummies(', ').astype(bool).rename(columns=f))
print(df)
date bar baz foo
0 2021-09-08 True False True
1 2021-09-10 True False False
2 2021-09-15 True True False
3 2021-09-22 True False False