Home > other >  Groupby two columns and find max count of categorical values in another column (Python)
Groupby two columns and find max count of categorical values in another column (Python)

Time:10-22

What is the Max count of values in the ‘stage’ column for a particular date. For the date of 2/1/2022, we see that the box AA holds the max count of the confidential value , which is 3.

Data

stage             box   Price   date
confidential      AA    $5,000  2/1/2022
confidential      AA    $5,050  2/1/2022
confidential      AA    $5,075  2/1/2022
confidential      BB    $5,116  2/1/2022
confidential      BB    $5,154  2/1/2022
nonconfidential   BB    $0      3/1/2022

Desired

Output I desire is the highest counts of 'confidential' value for a particular date. Example we see that for the date of 2/1/2022, AA has the highest counts of the confidential string.

box   max_count   date

AA    3           2/1/2022
BB    1           3/1/2022 

Doing

df.groupby(['box', 'date']).agg({'stage': 'size']}).reset_index()

I feel groupby and .agg is correct, however, I am not sure how to incorporate the max function. I am still researching. Any suggestion is appreciated.

CodePudding user response:

I think you want:

out = (df
 .loc[df['stage'].eq('confidential')]
 .groupby(['box', 'date'], as_index=False)
 .agg(count=('stage', 'size'))
)

print(out)

Output:

  box      date  count
0  AA  2/1/2022      3
1  BB  2/1/2022      2

To get the max, you need a second groupby on date only:

out2 = out.loc[out.groupby('date')['count'].idxmax()]

Output:

  box      date  count
0  AA  2/1/2022      3
  • Related