Filtering Nan with numpy-CodePudding

I have an array with shape (115, 2) and each column has 115 numbers. Some numbers are NaN in the 2nd column. How do I filter both columns with numpy to remove the NaN from the second column and the corresponding numbers in the first column?

Example

array([[10., 10.],
       [20., 13.],
       [ 5., nan],
       [ 6., nan]])

array([[10., 10.],
       [20., 13.]])

I want to filter both columns to exclude the values where the second column is NaN. I want to retain the shape so I can run statistics on the numbers like correlation. Any ideas?

If I try ~np.isnan the array loses its shape, which I want to retain. No pandas please!

CodePudding user response：

say, data is your numpy array. Then new-data would be your NaN removed numpy array

new_data = data[np.logical_not(np.isnan(data))]

CodePudding user response：

arr[~np.isnan(arr[:,1])]

then

Filter the rows but only from a 1d array of booleans

I guess that when you say "I've tried ~np.isnan", you mean that you have tried arr[~np.isnan(arr)]. But then, since arr is 2D (shape (115,2)), so is ~na.isnan(arr), which is a (115,2) shaped boolean array. In which case, what indexing from such array gives is all values of arr matching True. And without the original shape (how could it keep the original shape, since you may have not the same number of elements in both column).

~na.isnan(arr[:,1]) on the other hand is a 1D (115,) array of booleans. Indexing with this just select the rows.

See example of indexing with arrays of booleans

arr=np.arange(10).reshape(5,-1)
#array([[0, 1],
#       [2, 3],
#       [4, 5],
#       [6, 7],
#       [8, 9]])

# Indexing with a 2D-array of booleans (same shape as arr, so 5x2 here

arr[[[True, False], [False, False], [True, True], [False, False], [False, True]]]
#array([0, 4, 5, 9])
# I select the 1st element of 1st row, both elements of 3rd row, and last element of last row, so 0,4,5,9

# Indexing only along 1st axis, so with a 1d-array of 5 booleans, to select rows
arr[[True, False, True, True, False]]
#array([[0, 1],
#       [4, 5],
#       [6, 7]])
# I selected 1st, 3rd and 4th rows