How to fill a numpy arrays nan values with the means of their columns?-CodePudding

I convert part of a pandas dataframe to a numpy array and I want to fill it's values with the mean of the columns, similarily to how I would do the following in pandas:

df.fillna(df.mean(), inplace = True)

The only way I have been able to do it so far is iterate over the columns. Is there another way?

thank you!

CodePudding user response：

You can use np.take:

Setup a MRE

df = pd.DataFrame({'A': [1, np.nan, 2, 6], 'B': [5, np.nan, 8, 2]})

m = df.to_numpy()
print(m)

# Output
array([[ 1.,  5.],
       [nan, nan],
       [ 2.,  8.],
       [ 6.,  2.]])

mean = np.nanmean(m, axis=0)
idx = np.where(np.isnan(m))
m[idx] = np.take(mean, idx[1])
print(m)

# Output
array([[1., 5.],
       [3., 5.],
       [2., 8.],
       [6., 2.]])

CodePudding user response：

You can use np.where like below:

df = pd.DataFrame({'A': [2, 1, np.nan, 6], 'B': [4, np.nan, 8, np.nan]})

a = df.to_numpy()
print(a)
# [[ 2.  4.]
#  [ 1. nan]
#  [nan  8.]
#  [ 6. nan]]

a = np.where(np.isnan(a), np.nanmean(a, axis=0), a) 
print(a)

Output:

[[2. 4.]
 [1. 6.]
 [3. 8.]
 [6. 6.]]