Pandas - using group by and including value counts which are larger than n-CodePudding

I have a table which includes salary and company_location.

I was trying to calculate the mean salary of a country, its works:

wage = df.groupby('company_location').mean()['salary']

However, I have many with company_location which have less than 5 entries, I would like to exclude them from the report.

I know how to calculate countries with the top 5 entries:

Top_5 = df['company_location'].value_counts().head(5)

I am just having a problem connecting those to variables into one and making a graph out of it...

Thank you.

CodePudding user response：

You can remove rows whose value occurrence is below a threshold:

df = df[df.groupby('company_location')['company_location'].transform('size') > 5]

CodePudding user response：

You can do the following to only apply the groupby and aggregation to those with more than 5 records:

mask = (df['company_location'].map(df['company_location'].value_counts()) > 5)

wage = df[mask].groupby('company_location')['salary'].mean()