Home > Back-end >  Passing `skipna` argument to `agg`
Passing `skipna` argument to `agg`

Time:09-07

I want to set skipna=False when I use the agg method on a DataFrame.

My DataFrame has many (dynamic) columns. I'm performing groupby and aggregating using agg, like

import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [1, 2], "B": [np.nan, np.nan], "C": [0, 0]})

# the sum of B is 0.0
df.agg({"A": "sum", "B": "sum", "C": "max"})

When I'm aggregating a single column, or using a single aggregation function across the entire DataFrame, I can add skipna=False so that the nan values aren't skipped, i.e. df["B"].sum(skipna=False) or df.sum(skipna=False). This doesn't work for me because I'm doing a bunch of different functions (sum, avg, max).

How can I pass that skipna argument via the agg method?

CodePudding user response:

Personally I'd do:

out = pd.Series({'A': df['A'].sum(skipna=False), 
                 'B': df['B'].sum(skipna=False),
                 'C': df['C'].max()
                })

Also, agg with lambda would work as well:

df.agg({'A': lambda x: x.sum(skipna=False),
        'B': lambda x: x.sum(skipna=False),
        'C': 'max'})

CodePudding user response:

If you have lot of columns to aggregate here is another approach:

d = {"A": "sum", "B": "sum", "C": "max"}
pd.Series({c: getattr(df[c], f)(skipna=False) for c, f in d.items()})

A    3.0
B    NaN
C    0.0
dtype: float64
  • Related