Pandas Series of lists of strings how to count and append to df per row-CodePudding

Summarize the problem

I have a Dataframe with a Series of lists of strings. I can use a Counter from 'collections' to iterate over each row of the series and count the times each string appeared easy enough:

for list_of_strings in df['info']:
    counter = Counter()
    for string in list_of_strings:
        counter[string] =1

What I want is to mutate the df to now have as many columns as there were unique strings found in all of the lists of strings in the df['info'] series. The column names would be the string value.

For each row in the df, the new columns values would either be 0 if the string was not found or N where N is the count of how many times the string was found in that rows list_of_strings

Describe what you’ve tried

I was able to make a df whose # and name of columns matches the unique strings found, but I cant figure out how to append the counts per row, and Im not even sure if crafting and appending to a brand new dataframe is the right way to go about it?:

unique_df = pd.Series(copy_df['info'].explode().unique()).to_frame().T

I tried doing something using df.at() for each counter key but it exploded my juptyr notebook :\

Any help is appreciated, let me know what other info I can provide.

CodePudding user response：

Assuming this input Series:

s = pd.Series([['A','B','A'],['C','B'],['D']])

You can explode, groupby.value_counts and unstack with a fill_value of 0:

(s.explode()
  .groupby(level=0)
  .value_counts()
  .unstack(fill_value=0)
  .reindex(s.index, fill_value=0)
)

Or use crosstab:

s2 = s.explode()

(pd.crosstab(s2.index, s2)
   .rename_axis(index=None, columns=None)
 )

Output:

   A  B  C  D
0  2  1  0  0
1  0  1  1  0
2  0  0  0  1