What is the Max count of values in the ‘stage’ column for a particular date. For the date of 2/1/2022, we see that the box AA holds the max count of the confidential value , which is 3.
Data
stage box Price date
confidential AA $5,000 2/1/2022
confidential AA $5,050 2/1/2022
confidential AA $5,075 2/1/2022
confidential BB $5,116 2/1/2022
confidential BB $5,154 2/1/2022
nonconfidential BB $0 3/1/2022
Desired
Output I desire is the highest counts of 'confidential' value for a particular date. Example we see that for the date of 2/1/2022, AA has the highest counts of the confidential string.
box max_count date
AA 3 2/1/2022
BB 1 3/1/2022
Doing
df.groupby(['box', 'date']).agg({'stage': 'size']}).reset_index()
I feel groupby and .agg is correct, however, I am not sure how to incorporate the max function. I am still researching. Any suggestion is appreciated.
CodePudding user response:
I think you want:
out = (df
.loc[df['stage'].eq('confidential')]
.groupby(['box', 'date'], as_index=False)
.agg(count=('stage', 'size'))
)
print(out)
Output:
box date count
0 AA 2/1/2022 3
1 BB 2/1/2022 2
To get the max, you need a second groupby
on date only:
out2 = out.loc[out.groupby('date')['count'].idxmax()]
Output:
box date count
0 AA 2/1/2022 3