Home > front end >  dropping NaN in the pandas aggregation
dropping NaN in the pandas aggregation

Time:04-19

I am creating a pandas dataframe as follows:

data = {'A': [1, 2, None, 4], 'B': [1, 2, None, 4]}
x = pd.DataFrame(data)

     A    B
0  1.0  1.0
1  2.0  2.0
2  NaN  NaN
3  4.0  4.0

Now, I aggregate across columns as:

x.agg("sum", axis='columns')

This results in:

0    2.0
1    4.0
2    0.0
3    8.0
dtype: float64

In this case, it seems to be treating NaN as 0 which is not what I want. Is it possible to treat missing values as NaNs for the aggregation i.e. the sum be NaN for such cases?

CodePudding user response:

Calculate the sum of the columns, skipping any missing values.

x.agg("sum", axis='columns', skipna=False)

output:

0    2.0
1    4.0
2    NaN
3    8.0
dtype: float64

CodePudding user response:

In Pandas, methods for computing descripting statistics have a skipna option indicating if missing data is included. It is True by default. You can set it False using skipna=False in the method argument. Then the NaNs will be skipped.

Reference

  • Related