I convert part of a pandas dataframe to a numpy array and I want to fill it's values with the mean of the columns, similarily to how I would do the following in pandas:
df.fillna(df.mean(), inplace = True)
The only way I have been able to do it so far is iterate over the columns. Is there another way?
thank you!
CodePudding user response:
You can use np.take
:
Setup a MRE
df = pd.DataFrame({'A': [1, np.nan, 2, 6], 'B': [5, np.nan, 8, 2]})
m = df.to_numpy()
print(m)
# Output
array([[ 1., 5.],
[nan, nan],
[ 2., 8.],
[ 6., 2.]])
mean = np.nanmean(m, axis=0)
idx = np.where(np.isnan(m))
m[idx] = np.take(mean, idx[1])
print(m)
# Output
array([[1., 5.],
[3., 5.],
[2., 8.],
[6., 2.]])
CodePudding user response:
You can use np.where
like below:
df = pd.DataFrame({'A': [2, 1, np.nan, 6], 'B': [4, np.nan, 8, np.nan]})
a = df.to_numpy()
print(a)
# [[ 2. 4.]
# [ 1. nan]
# [nan 8.]
# [ 6. nan]]
a = np.where(np.isnan(a), np.nanmean(a, axis=0), a)
print(a)
Output:
[[2. 4.]
[1. 6.]
[3. 8.]
[6. 6.]]