dropping NaN in the pandas aggregation-CodePudding

I am creating a pandas dataframe as follows:

data = {'A': [1, 2, None, 4], 'B': [1, 2, None, 4]}
x = pd.DataFrame(data)

     A    B
0  1.0  1.0
1  2.0  2.0
2  NaN  NaN
3  4.0  4.0

Now, I aggregate across columns as:

x.agg("sum", axis='columns')

This results in:

0    2.0
1    4.0
2    0.0
3    8.0
dtype: float64

In this case, it seems to be treating NaN as 0 which is not what I want. Is it possible to treat missing values as NaNs for the aggregation i.e. the sum be NaN for such cases?

CodePudding user response：

Calculate the sum of the columns, skipping any missing values.

x.agg("sum", axis='columns', skipna=False)

output:

0    2.0
1    4.0
2    NaN
3    8.0
dtype: float64

CodePudding user response：

In Pandas, methods for computing descripting statistics have a skipna option indicating if missing data is included. It is True by default. You can set it False using skipna=False in the method argument. Then the NaNs will be skipped.

Reference