Home > Net >  How to group a dataframe with 2 columns and get highest count?
How to group a dataframe with 2 columns and get highest count?

Time:12-12

I have the following dataframe (df). Col A is a category and Col B is an item in that category.

Col A Col B
red car
red car
red truck
red ball
blue bus
blue bus
blue bus
blue truck
blue car

I want to get another dataframe (df2) showing the total count of distinct categories in Col A, followed by the count highest occurring item in Col B corresponding to Col A as below:

Col A Count A Col B Count B
red 4 car 2
blue 5 bus 2

Any idea on how to generate this dataframe?

I have tried this command:

df2 = df.groupby('Col A')['Col B'].apply(lambda x: x.value_counts().index[0]).reset_index()

and I get the following result:

Col A Col B
red car
blue bus

I don't know how to get the two counts column. Any ideas?

CodePudding user response:

You can use Counter from Collections:

from collections import Counter
final = df.groupby(['Col A']).agg({'Col A':'count','Col B':list})

'''
       Col A                        Col B
Col A                                    
blue       5  [bus, bus, bus, truck, car]
red        4      [car, car, truck, ball]
'''

final['Col_b'] = final['Col B'].apply(lambda x: Counter(x).most_common(1)[0][0]) #get most common value
final['Count_b'] = final['Col B'].apply(lambda x: Counter(x).most_common(1)[0][1]) #get count of most common value
final=final.drop('Col B',axis=1).rename(columns={'Col A':'Count A'}).reset_index()

Output:

|    | Col A   |   Count A | Col_b   |   Count_b |
|---:|:--------|----------:|:--------|----------:|
|  0 | blue    |         5 | bus     |         3 |
|  1 | red     |         4 | car     |         2 |
  • Related