Is there a way to remove specific elements in an array using numpy functions?-CodePudding

Is there a way to remove specific elements in an array using numpy.delete, boolean mask (or any other function) that meet certain criteria such as conditionals on that data type?, this by using numpy methods.

For example:

import numpy as np

arr = np.random.chisquare(6, 10)

array([4.61518458, 4.80728541, 4.59749491, 3.44053946, 5.52507358,
       7.97092747, 2.01946678, 6.26877508, 3.68286537, 2.06759469])`

Now for test purposes I would like to know if I can use some numpy function to remove all elements that are divisible by the given value k

>>> np.delete(arr, 1, 0)
[4.61518458 4.59749491 3.44053946 5.52507358 7.97092747 2.01946678
 6.26877508 3.68286537 2.06759469]

the delete(arr, 1, 0) call only removes the value at that position, is there a way to delete multiple values based on anonymous function lambda or a condition like the one I mentioned above?.

CodePudding user response：

Yes, this is part of numpy's magic indexing. You can use comparison operator or the apply function to produce an array of booleans, with True for the ones to keep and False for the ones to toss. So, for example, to keep all the elements less than 5::

selections = array < 5
array = array[selections]

That will only keep the elements where selections is True.

Of course, since all your values are floats, they aren't going to be divisible by an integer k, but that's another story.

CodePudding user response：

For doing such division, based on the answer of Tim:

k = 6  # a number
array = array[array % k == 0]

CodePudding user response：

Integers

If you're looking at integer divisions, something like @Ali_Sh's answer will work:

>>> x = np.array([3, 5, 6, 7, 9, 0])
>>> x[x%2==0]
array([6, 0])

or, to eliminate them

>>> x[x%2!=0]
array([3, 4, 7, 9])

Floats

If you have floats, which it looks like you do, then numerical issues can make it slightly more challenging

>>> k = 1.0000002300000000450001000101
>>> x = np.array([k * i for i in range(1,10)]   [0.5,])
>>> x
array([1.00000023, 2.00000046, 3.00000069, 4.00000092, 5.00000115,
       6.00000138, 7.00000161, 8.00000184, 9.00000207, 0.5])
>>> x[x%k==0]
array([1.00000023, 2.00000046, 3.00000069, 4.00000092, 6.00000138,
       8.00000184])

We've missed a few that we would like to have been caught (5.000..., 6.000... and 9.000...). If we look at the modular division itself, we see that some of the missed numbers are almost zero (((7*k)%k/k)=4.44089108e-16) and others are almost equal to k (((5*k)%k/k)=1.00000000e 00):

>>> (x % k)/k
array([0.00000000e 00, 0.00000000e 00, 0.00000000e 00, 0.00000000e 00,
       1.00000000e 00, 0.00000000e 00, 4.44089108e-16, 0.00000000e 00,
       1.00000000e 00, 4.99999885e-01])

So the solution is to look not just for cases that are zero but also those that are almost 0 and almost k. For that you need to define some tolerance level (I use delta=10**-10 below) and then find the values that are close enough.

>>> delta = 10**-10
>>> x[np.logical_or((x % k) <= delta, (k - (x % k)) <= delta)]
array([1.00000023, 2.00000046, 3.00000069, 4.00000092, 5.00000115,
       6.00000138, 7.00000161, 8.00000184, 9.00000207])

In your case you're looking to eliminate them, so instead you'd want:

>>> x[np.logical_and((x % k) >= delta, (k - (x % k)) >= delta)]
array([0.5])