Summarize the problem
I have a Dataframe with a Series of lists of strings. I can use a Counter from 'collections' to iterate over each row of the series and count the times each string appeared easy enough:
for list_of_strings in df['info']:
counter = Counter()
for string in list_of_strings:
counter[string] =1
What I want is to mutate the df to now have as many columns as there were unique strings found in all of the lists of strings in the df['info'] series. The column names would be the string value.
For each row in the df, the new columns values would either be 0 if the string was not found or N where N is the count of how many times the string was found in that rows list_of_strings
Describe what you’ve tried
I was able to make a df whose # and name of columns matches the unique strings found, but I cant figure out how to append the counts per row, and Im not even sure if crafting and appending to a brand new dataframe is the right way to go about it?:
unique_df = pd.Series(copy_df['info'].explode().unique()).to_frame().T
I tried doing something using df.at() for each counter key but it exploded my juptyr notebook :\
Any help is appreciated, let me know what other info I can provide.
CodePudding user response:
Assuming this input Series:
s = pd.Series([['A','B','A'],['C','B'],['D']])
You can explode
, groupby.value_counts
and unstack
with a fill_value
of 0:
(s.explode()
.groupby(level=0)
.value_counts()
.unstack(fill_value=0)
.reindex(s.index, fill_value=0)
)
Or use crosstab
:
s2 = s.explode()
(pd.crosstab(s2.index, s2)
.rename_axis(index=None, columns=None)
)
Output:
A B C D
0 2 1 0 0
1 0 1 1 0
2 0 0 0 1