Home > Software engineering >  Using Lambda in GroupBy .agg to find Mean
Using Lambda in GroupBy .agg to find Mean

Time:11-24

I'm not understanding how to pull the Mean of filtered data within a groupby DataFrame. It works perfectly for sum(), but mean() simply gives me the Percentage of occurrences from the total count.

df = test.groupby(['Code']).agg(
  Count=('% Change', 'count'),
  H2C_Up_Mean=('% C2H',lambda x: (x > 0).mean()),
  H2C_Pct_Up=('% C2H',lambda x: (x > 0).sum())

"H2C_Pct_Up" properly sums when I manually tally in Excel. "H2C_Up_Mean" gives me "H2C_Pct_Up" DIVIDED by "Count"

How can I pull the 'Mean' of when "x > 0" from within a groupby dataframe?

Example Dataframe:

code   % change    % C2H
abc    0.50        0.75
abc    1.00        0.25
abc    -0.25       -0.50

Expected output...

  • Count : 3 (CORRECT)
  • H2C_Pct_Up : 2 (CORRECT)
  • H2C_Up_Mean : 0.50 (ERROR; This produces 0.667)

CodePudding user response:

df = test.groupby(['Code']).agg(
  Count=('% Change', 'count'),
  H2C_Up_Mean=('% C2H',lambda x: x[x > 0].mean()),
  H2C_Pct_Up=('% C2H',lambda x: (x > 0).sum()))

gives df as:

      Count  H2C_Up_Mean  H2C_Pct_Up
Code                                
abc       3          0.5           2
  • Related