Home > Net >  Median from NumPy and DataFrame are differently evaluated
Median from NumPy and DataFrame are differently evaluated

Time:04-01

Why do Pandas and NumPy treat their evaluation differently for some basic functions like the median?

Pandas automatically omits NaN values, NumPy does not.

import numpy as np
import pandas as pd

np.random.seed(10)

df = pd.DataFrame(np.random.randint(0, 10, size=10), columns=['x'])
df.loc[df.x > 1, 'x'] = np.NaN

print(df)

#     x
#0  NaN
#1  NaN
#2  0.0
#3  1.0
#4  NaN
#5  0.0
#6  1.0
#7  NaN
#8  NaN
#9  0.0

print(df['x'].median())

#0.0

print(np.median(df['x']))

#nan

CodePudding user response:

They are 2 different libraries. They use different conventions/defaults.

If you want to ignore the NaN:

np.nanmedian(df['x'])
df['x'].median()

If you want to have a NaN result:

np.median(df['x'])
df['x'].median(skipna=False)
  • Related