I have an array with shape (115, 2) and each column has 115 numbers. Some numbers are NaN in the 2nd column. How do I filter both columns with numpy to remove the NaN from the second column and the corresponding numbers in the first column?
Example
array([[10., 10.],
[20., 13.],
[ 5., nan],
[ 6., nan]])
to
array([[10., 10.],
[20., 13.]])
I want to filter both columns to exclude the values where the second column is NaN. I want to retain the shape so I can run statistics on the numbers like correlation. Any ideas?
If I try ~np.isnan the array loses its shape, which I want to retain. No pandas please!
CodePudding user response:
say, data is your numpy array. Then new-data would be your NaN removed numpy array
new_data = data[np.logical_not(np.isnan(data))]
CodePudding user response:
arr[~np.isnan(arr[:,1])]
then
Filter the rows but only from a 1d array of booleans
I guess that when you say "I've tried ~np.isnan
", you mean that you have tried arr[~np.isnan(arr)]
. But then, since arr
is 2D (shape (115,2)), so is ~na.isnan(arr)
, which is a (115,2) shaped boolean array. In which case, what indexing from such array gives is all values of arr
matching True
. And without the original shape (how could it keep the original shape, since you may have not the same number of elements in both column).
~na.isnan(arr[:,1])
on the other hand is a 1D (115,) array of booleans. Indexing with this just select the rows.
See example of indexing with arrays of booleans
arr=np.arange(10).reshape(5,-1)
#array([[0, 1],
# [2, 3],
# [4, 5],
# [6, 7],
# [8, 9]])
# Indexing with a 2D-array of booleans (same shape as arr, so 5x2 here
arr[[[True, False], [False, False], [True, True], [False, False], [False, True]]]
#array([0, 4, 5, 9])
# I select the 1st element of 1st row, both elements of 3rd row, and last element of last row, so 0,4,5,9
# Indexing only along 1st axis, so with a 1d-array of 5 booleans, to select rows
arr[[True, False, True, True, False]]
#array([[0, 1],
# [4, 5],
# [6, 7]])
# I selected 1st, 3rd and 4th rows