Filtering the Min Value of a Mean in a GroupBy Pandas-CodePudding

I have a Star Wars People df with the following Columns:

columns = [name, height, mass, birth_year, gender, homeworld]

name is the index

I need to compute the following:

Which is the planet with the lowest average mass index of its characters? Which character/s are from that planet?

Which I tried:

df.groupby(["homeworld"]).filter(lambda row: row['mass'].mean() > 0).min()

However, I need to have the min() inside the filter because I can have more than 1 character in the homeworld that have this lowest average mass index. Right now the filter function is not doing anything, is just to show how I want the code to be.

How can I achieve that? Hopefully with the filter function.

CodePudding user response：

Use:

#aggreagate mean to Series
s = df.groupby("homeworld")['mass'].mean()

#filter out negative values and get homeworld with minimum value
out = s[s.gt(0)].idxmin()

#filter original DataFrame
df1 = df[df['homeworld'].eq(out)]

CodePudding user response：

What do you mean with "more than 1 character in the homeworld that have this lowest average mass index"?

It should not matter how many characters are present per homeworld, the groupby aggregation with the mean method will calculate the averages for you.

When I look at the question you can just do the groupby like so:

df = df.groupby(['homeworld']).mean().sort_values(by=["mass"], ascending=False)
df.head(1)

And note the homeworld that is displayed