I am creating a pandas dataframe as follows:
data = {'A': [1, 2, None, 4], 'B': [1, 2, None, 4]}
x = pd.DataFrame(data)
A B
0 1.0 1.0
1 2.0 2.0
2 NaN NaN
3 4.0 4.0
Now, I aggregate across columns as:
x.agg("sum", axis='columns')
This results in:
0 2.0
1 4.0
2 0.0
3 8.0
dtype: float64
In this case, it seems to be treating NaN
as 0 which is not what I want. Is it possible to treat missing values as NaNs for the aggregation i.e. the sum be NaN for such cases?
CodePudding user response:
Calculate the sum of the columns, skipping any missing values.
x.agg("sum", axis='columns', skipna=False)
output:
0 2.0
1 4.0
2 NaN
3 8.0
dtype: float64
CodePudding user response:
In Pandas, methods for computing descripting statistics have a skipna
option indicating if missing data is included. It is True
by default. You can set it False
using skipna=False
in the method argument. Then the NaN
s will be skipped.