numpy: column-wise operation with conditions-CodePudding

I want to do some operations on a matrix that has a lot of "nan" in it. The goal is to go through the matrix column-wise and fill all nan's in a column if there is a constant value in there. I.e. if a column has (beside of nan's) always the same value, then all the nan's in that column should be filled with that value.

I.e the input might look like the matrix below. The second column should now all be filled with 2.0, but the last should stay as it is. I include code that does this, but I would love to have it in a more efficient way, as it is going to be applied to large matrices.

#INPUT
[[0.15266373        nan 0.06203641 0.45986034        nan]
 [0.92699705        nan 0.76849622 0.26920507        nan]
 [0.09337326 2.         0.58961375 0.34334054        nan]
 [0.62647321 2.         0.55225681 0.26886006 2.        ]
 [0.2229281         nan 0.39064809 0.19316241 3.        ]]

# OUTPUT
[[0.15266373 2.         0.06203641 0.45986034        nan]
 [0.92699705 2.         0.76849622 0.26920507        nan]
 [0.09337326 2.         0.58961375 0.34334054        nan]
 [0.62647321 2.         0.55225681 0.26886006 2.        ]
 [0.2229281  2.         0.39064809 0.19316241 3.        ]]

# CODE FOR MOCK MATRIX AND FILLING OF NANs
# -----------------------------------------
import numpy as np 
# PREP OF MOCK MATRIX

np.random.seed(777)
a = np.random.rand(5, 5)
a[:,1] = np.nan
a[[2, 3],1] = 2.0

a[:,4] = np.nan
a[4,4] = 3.0
a[3,4] = 2.0


# FILL THE WANTED STRUCTURE
for c in range(a.shape[1]):
    values = np.unique(a[~np.isnan(a[:,c]),c])
    if values.size == 1:
        a[:,c] = values

Any help is appreciated. Best

CodePudding user response：

This is one way to do it:

colmin = np.nanmin(a, axis=0)
colmax = np.nanmax(a, axis=0)
b = (colmin == colmax)
a[:,b] = colmin[b]

A RuntimeWarning will be given if there are all-NaN columns, see here if you wish to hide these.