Home > Software engineering >  Delete negative elements which are between positives only
Delete negative elements which are between positives only

Time:05-02

a = [1, 3, 6, -2, 4, 5, 8, -3, 9,
     2, -5, -7, -9, 3, 6, -7, -6, 2]

I want to do like:

a = [1, 3, 6, 4, 5, 8, 9, 2,
     -5, -7, -9, 3, 6, -7, -6, 2]

which deletes only 4th and 8th elements, which are single negative elements between two positive elements.

import numpy as np

a = [1, 3, 6, -2, 4, 5, 8, -3, 9,
     2, -5, -7, -9, 3, 6, -7, -6, 2]

for i in range(len(a)):
    if a[i] < 0 and a[i - 1] > 0  and a[i   1] > 0:
        np.delete(a[i])
print(a)

This did not work. Can I know where I have to fix?

CodePudding user response:

Because you ask about numpy in the subject line and also attempt to use np.delete() in your code, I assume you intend for a to be a numpy array.

Here is a way to do what your question asks using vectorized operations in numpy:

import numpy as np
a = np.array([1,3,6,-2,4,5,8,-3,9,2,-5,-7,-9, 3, 6, -7, -6, 2])
b = np.concatenate([a[1:], [np.NaN]])
c = np.concatenate([[np.NaN], a[:-1]])
d = (a<0)&(b>0)&(c>0)
print(a[~d])

Output:

[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]

What we've done is to shift a one to the left with NaN fill on the right (b) and one to the right with NaN fill on the left (c), then to create a boolean mask d using vectorized compare and boolean operators <, > and & which is True only where we want to delete single negative values sandwiched between positives. Finally, we use the ~ operator to flip the boolean value of the mask and use it to filter out the unneeded negative values in a.

UPDATE: Here are some timeit() comparisons of several variations on answers given for this question using NumPy 1.22.2.

The fastest of the 8 strategies is: a = np.concatenate([a[:1], a[1:-1][(a[1:-1]>=0)|(a[2:]<=0)|(a[:-2]<=0)], a[-1:]])

A close second is: a = a[np.concatenate([[True], ~((a[1:-1]<0)&(a[2:]>0)&(a[:-2]>0)), [True]])]

The strategies using np.r_(), either with np.delete() or with a boolean mask and [] syntax, are about twice as slow as the fastest.

The strategy using numpy.roll() is about 3 times as slow as the fastest. Note: As highlighted by in a comment by @Kelly Bundy, the roll() strategy in the benchmark does not give a correct answer to this question in all cases (though for the particular input example it happens to). I have nevertheless included it in the benchmark because the performance of roll() relative to concatenate() and r_() may be of general interest beyond the narrow context of this question.

Results:

foo_1 output:
[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
foo_2 output:
[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
foo_3 output:
[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
foo_4 output:
[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
foo_5 output:
[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
foo_6 output:
[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
foo_7 output:
[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
foo_8 output:
[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
Timeit results:
foo_1 ran in 1.2354546000715346e-05 seconds using 100000 iterations
foo_2 ran in 1.0962473000399769e-05 seconds using 100000 iterations
foo_3 ran in 7.733614000026136e-06 seconds using 100000 iterations
foo_4 ran in 7.751871000509709e-06 seconds using 100000 iterations
foo_5 ran in 5.856722998432815e-06 seconds using 100000 iterations
foo_6 ran in 7.5727709988132115e-06 seconds using 100000 iterations
foo_7 ran in 1.7790602000895887e-05 seconds using 100000 iterations
foo_8 ran in 5.435103999916464e-06 seconds using 100000 iterations

Code that generated the results:

import numpy as np

a = np.array([1,3,6,-2,4,5,8,-3,9,2,-5,-7,-9, 3, 6, -7, -6, 2])
from timeit import timeit
def foo_1(a):
    a = a if a.shape[0] < 2 else np.delete(a, np.r_[False, (a[1:-1] < 0) & (a[:-2] > 0) & (a[2:] > 0), False])
    return a
def foo_2(a):
    a = a if a.shape[0] < 2 else a[np.r_[True, ~((a[1:-1] < 0) & (a[:-2] > 0) & (a[2:] > 0)), True]]
    return a
def foo_3(a):
    b = np.concatenate([a[1:], [np.NaN]])
    c = np.concatenate([[np.NaN], a[:-1]])
    d = (a<0)&(b>0)&(c>0)
    a = a[~d]
    return a
def foo_4(a):
    a = a[~((a<0)&(np.concatenate([a[1:], [np.NaN]])>0)&(np.concatenate([[np.NaN], a[:-1]])>0))]
    return a
def foo_5(a):
    a = a if a.shape[0] < 2 else a[np.concatenate([[True], ~((a[1:-1]<0)&(a[2:]>0)&(a[:-2]>0)), [True]])]
    return a
def foo_6(a):
    a = a if a.shape[0] < 2 else np.delete(a, np.concatenate([[False], (a[1:-1]<0)&(a[2:]>0)&(a[:-2]>0), [False]]))
    return a
def foo_7(a):
    mask_bad = (
       (a < 0) &  # the value is < 0 AND
       (np.roll(a,1) >= 0) & # the value to the right is >= 0
       (np.roll(a,-1) >= 0) # the value to the left is >= 0
    )
    mask_good = ~mask_bad
    a = a[mask_good]
    return a
def foo_8(a):
    a = np.concatenate([a[:1], a[1:-1][(a[1:-1]>=0)|(a[2:]<=0)|(a[:-2]<=0)], a[-1:]])
    return a

foo_count = 8
for foo in ['foo_'   str(i   1) for i in range(foo_count)]:
    print(f'{foo} output:')
    print(eval(f"{foo}(a)"))

n = 100000
print(f'Timeit results:')
for foo in ['foo_'   str(i   1) for i in range(foo_count)]:
    t = timeit(f"{foo}(a)", setup=f"from __main__ import a, {foo}", number=n) / n
    print(f'{foo} ran in {t} seconds using {n} iterations')

CodePudding user response:

Your conditional logic

if a[i] < 0 and a[i - 1] > 0 and a[i   1] > 0

seems sound and readable to me. But it would have issues with the boundary cases:

[1, 2, -3] -> IndexError: list index out of range
[-1, 2, 3] -> [2, 3]

Handling it properly could be as simple as skipping the first and last element of you list with

for i in range(1, len(a) - 1)

Test

import numpy as np


def del_neg_between_pos(a):
    delete_idx = []
    for i in range(1, len(a) - 1):
        if a[i] < 0 and a[i - 1] > 0 and a[i   1] > 0:
            delete_idx.append(i)

    return np.delete(a, delete_idx)


if __name__ == "__main__":
    a1 = [1, 3, 6, -2, 4, 5, 8, -3, 9, 2, -5, -7, -9, 3, 6, -7, -6, 2]
    a2 = [1, 2, -3]
    a3 = [-1, 2, 3]
    for a in [a1, a2, a3]:
        print(del_neg_between_pos(a))

Output

[ 1  3  6  4  5  8  9  2 -5 -7 -9  3  6 -7 -6  2]
[ 1  2 -3]
[-1  2  3]

CodePudding user response:

A solution that handles edges correctly and doesn't create an unholy number of temporary arrays:

a = np.delete(a, np.r_[False, (a[1:-1] < 0) & (a[:-2] > 0) & (a[2:] > 0), False])

Alternatively, you can create the positive rather than the negative mask

a = a[np.r_[True, (a[1:-1] >= 0) | (a[:-2] <= 0) | (a[2:] <= 0), True]]

Since np.concatenate is faster than np.r_, you could rephrase the masks as

np.concatenate(([False], (a[1:-1] < 0) & (a[:-2] > 0) & (a[2:] > 0), [False])

and

np.concatenate(([True], (a[1:-1] >= 0) | (a[:-2] <= 0) | (a[2:] <= 0), [True]))

In some cases, you might get extra mileage out of applying np.where(...)[0] or np.flatnonzero to the mask. This works sometimes because it avoids having to recompute the size of the number of masked elements twice.

CodePudding user response:

a = numpy.array([1,3,6,-2,4,5,8,-3,9,2,-5,-7,-9, 3, 6, -7, -6, 2])
mask_bad = (
   (a < 0) &  # the value is < 0 AND
   (numpy.roll(a,1) >= 0) & # the value to the right is >= 0
   (numpy.roll(a,-1) >= 0) # the value to the left is >= 0
)
mask_good = ~mask_bad
print(a[mask_good])

is one way you might do this (although it probably has an issue or two at the edges)

CodePudding user response:

Here is a simple one-liner:

import numpy as np

a = np.array([1, 3, 6, -2, 4, 5, 8, -3, 9, 2, -5, -7, -9, 3, 6, -7, -6, 2])
n = np.where((a[1: -1] < 0) & (a[2:] > 0) & (a[:-2] > 0))[0]   1
print(n)

Output:

[3 7]
  • Related