Home > Mobile >  Pandas Series of lists of strings how to count and append to df per row
Pandas Series of lists of strings how to count and append to df per row

Time:09-18

Summarize the problem

I have a Dataframe with a Series of lists of strings. I can use a Counter from 'collections' to iterate over each row of the series and count the times each string appeared easy enough:

for list_of_strings in df['info']:
    counter = Counter()
    for string in list_of_strings:
        counter[string] =1  

What I want is to mutate the df to now have as many columns as there were unique strings found in all of the lists of strings in the df['info'] series. The column names would be the string value.

For each row in the df, the new columns values would either be 0 if the string was not found or N where N is the count of how many times the string was found in that rows list_of_strings

Describe what you’ve tried

I was able to make a df whose # and name of columns matches the unique strings found, but I cant figure out how to append the counts per row, and Im not even sure if crafting and appending to a brand new dataframe is the right way to go about it?:

unique_df = pd.Series(copy_df['info'].explode().unique()).to_frame().T

I tried doing something using df.at() for each counter key but it exploded my juptyr notebook :\

Any help is appreciated, let me know what other info I can provide.

CodePudding user response:

Assuming this input Series:

s = pd.Series([['A','B','A'],['C','B'],['D']])

You can explode, groupby.value_counts and unstack with a fill_value of 0:

(s.explode()
  .groupby(level=0)
  .value_counts()
  .unstack(fill_value=0)
  .reindex(s.index, fill_value=0)
)

Or use crosstab:

s2 = s.explode()

(pd.crosstab(s2.index, s2)
   .rename_axis(index=None, columns=None)
 )

Output:

   A  B  C  D
0  2  1  0  0
1  0  1  1  0
2  0  0  0  1
  • Related