I want to set skipna=False
when I use the agg
method on a DataFrame.
My DataFrame has many (dynamic) columns. I'm performing groupby
and aggregating using agg
, like
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [1, 2], "B": [np.nan, np.nan], "C": [0, 0]})
# the sum of B is 0.0
df.agg({"A": "sum", "B": "sum", "C": "max"})
When I'm aggregating a single column, or using a single aggregation function across the entire DataFrame, I can add skipna=False
so that the nan
values aren't skipped, i.e.
df["B"].sum(skipna=False)
or df.sum(skipna=False)
. This doesn't work for me because I'm doing a bunch of different functions (sum, avg, max).
How can I pass that skipna
argument via the agg
method?
CodePudding user response:
Personally I'd do:
out = pd.Series({'A': df['A'].sum(skipna=False),
'B': df['B'].sum(skipna=False),
'C': df['C'].max()
})
Also, agg
with lambda would work as well:
df.agg({'A': lambda x: x.sum(skipna=False),
'B': lambda x: x.sum(skipna=False),
'C': 'max'})
CodePudding user response:
If you have lot of columns to aggregate here is another approach:
d = {"A": "sum", "B": "sum", "C": "max"}
pd.Series({c: getattr(df[c], f)(skipna=False) for c, f in d.items()})
A 3.0
B NaN
C 0.0
dtype: float64