Home > Blockchain >  Pandas Groupby Agg: Get strings that appear the most in column of lists
Pandas Groupby Agg: Get strings that appear the most in column of lists

Time:05-21

I have a dataframe like this with IDs and Preferences in a string separated by ', ':

ID Preferences
1 banana, apple
1 banana, apple, kiwi
1 avocado, apple
2 avocado, grapes
2 banana, apple, kiwi

And I want to group by the ID and get the 2 preferences that appear the most, so the result would be like:

ID first_preference second_preference
1 apple banana
2 avocado, grapes, banana, apple, kiwi

with 'draws' being concatenated together.

I need to do this on an aggregated groupby because I have other columns I need to aggregate too.

Could anyone help me? Thanks!

CodePudding user response:

To obtain the preferences, first split and expand your DataFrame into a much longer Series. Then, count and rank the number of occurrences, and another groupby agg will allow you to join the ties for the first and second prefernences.

The index of this result will be the unique 'ID' values in your original DataFrame, so you can concat this with the results of your other groupby agg operations

Sample Data

import pandas as pd
df = pd.DataFrame({'ID': [1,1,1,2,2],
                   'Preferences': ['banana, apple', 'banana, apple, kiwi', 'avocado, apple', 'avocado, grapes',
                                   'banana, apple, kiwi']})

Code

# Expand to long Series
s = df.set_index(['ID']).Preferences.str.split(', ', expand=True).stack()

# Within each ID, rank preferences based on # of occurrences
s = (s.groupby([s.index.get_level_values(0), s.rename('preference')]).size()
      .groupby(level=0).rank(method='dense', ascending=False)
      .map({1: 'first', 2: 'second'}).rename('order'))

res = s[s.isin(['first', 'second'])].reset_index().groupby(['ID', 'order']).agg(', '.join).unstack(-1)

# Collapse MultiIndex to get simple column labels
res.columns = [f'{y}_{x}' for x,y in res.columns]

print(res)
                        first_preference second_preference
ID                                                        
1                                  apple            banana
2   apple, avocado, banana, grapes, kiwi               NaN
  • Related