would like to create columns that correspond to counts of strings that appear in a column that is an array of strings.
for example
idx | strings
1 | ['a','b','c']
2 | ['b','d','e']
to
idx | 'a' | 'b' | 'c' | 'd' | 'e'
1 | 1 | 1 | 1 | 0 | 0
2 | 0 | 1 | 0 | 1 | 1
CodePudding user response:
You can try with explode
then get_dummies
out = df.join(df.pop('strings').explode().str.get_dummies().groupby(level=0).sum())
CodePudding user response:
You can use Counter
from collections import Counter
pd.DataFrame(df.strings.apply(Counter).to_list(), index=df.index)
# a b c d e
# 1 1.0 1 1.0 NaN NaN
# 2 NaN 1 NaN 1.0 1.0