I'm not understanding how to pull the Mean of filtered data within a groupby DataFrame. It works perfectly for sum(), but mean() simply gives me the Percentage of occurrences from the total count.
df = test.groupby(['Code']).agg(
Count=('% Change', 'count'),
H2C_Up_Mean=('% C2H',lambda x: (x > 0).mean()),
H2C_Pct_Up=('% C2H',lambda x: (x > 0).sum())
"H2C_Pct_Up" properly sums when I manually tally in Excel. "H2C_Up_Mean" gives me "H2C_Pct_Up" DIVIDED by "Count"
How can I pull the 'Mean' of when "x > 0" from within a groupby dataframe?
Example Dataframe:
code % change % C2H
abc 0.50 0.75
abc 1.00 0.25
abc -0.25 -0.50
Expected output...
- Count : 3 (CORRECT)
- H2C_Pct_Up : 2 (CORRECT)
- H2C_Up_Mean : 0.50 (ERROR; This produces 0.667)
CodePudding user response:
df = test.groupby(['Code']).agg(
Count=('% Change', 'count'),
H2C_Up_Mean=('% C2H',lambda x: x[x > 0].mean()),
H2C_Pct_Up=('% C2H',lambda x: (x > 0).sum()))
gives df
as:
Count H2C_Up_Mean H2C_Pct_Up
Code
abc 3 0.5 2