Home > Software design >  Take Smallest Group by Average
Take Smallest Group by Average

Time:09-28

I'm looking to filter a set for the group with the smallest average value. In the example below, we'd group by ID, get the average year, and then only keep the group with the smallest average year. This feels basic, but I haven't seen an answer for it yet.

df = pd.DataFrame({'ID':[1,1,2,2,3,3], 
                   'YEAR' : [2011,2012,2012,2013,2013,2014], 
                   'V': [0,1,1,0,1,0],
                   'C':[00,11,22,33,44,55]})

CodePudding user response:

In [180]: min_id = df.groupby("ID")["YEAR"].mean().idxmin()

In [181]: df.query("ID == @min_id")
Out[181]:
   ID  YEAR  V   C
0   1  2011  0   0
1   1  2012  1  11
  • get the index of the minimum mean year having group
  • filter the frame where ID is equal to that

this is how the aggregated mean looks like:

In [182]: df.groupby("ID")["YEAR"].mean()
Out[182]:
ID
1    2011.5
2    2012.5
3    2013.5
Name: YEAR, dtype: float64

so the minimum is 2011.5, index of it is 1 (the ID).

CodePudding user response:

groups = df.groupby(['ID']).mean()

Then

groups.loc[groups['YEAR'] == groups['YEAR'].min()]
  • Related