substitute portions of array in python-CodePudding

I'm trying to find out the easiest way to substitute portions of an array, on the basis of another array, in Python.

I've something like the following:

data = np.array([[0.3,15],[1.6,24],[2.1,53],[3.8,52],[4.1,13],
                 [5.4,87],[6.5,13],[7.3,62],[8.7,83],[9.6,82],
                 [10.3,38],[11.2,11],[12.6,59],[13.8,22],
                 [14.9,74],[15.4,2]])

and I want to set to nan all entries included between certain starts and stops:

forbid_start = np.array([1.4,7.9,13.0])
forbid_stop  = np.array([3.8,10.2,14.9])

to get an array like this:

data2 = np.array([[0.3,15],[1.6,nan],[2.1,nan],[3.8,nan],[4.1,13],
                  [5.4,87],[6.5,13],[7.3,62],[8.7,nan],[9.6,nan],
                  [10.3,38],[11.2,11],[12.6,59],[13.8,nan],
                  [14.9,nan],[15.4,2]])

I'm trying with some cycles, but I guess it's not the right way to address the problem... Thanks in advance.

CodePudding user response：

You can loop over the length of forbid_start and forbid_stop to get each pair of start/stop as below:

for i in range(len(forbid_start)):
    start = forbid_start[i]
    end = forbid_stop[i]

You can then use list comprehension to update the list with nans for the values that fall between each start stop pair, like this:

data = [j if ((j < start) or (j > end)) else np.nan for j in data]

Full code:

import numpy as np

data = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

forbid_start = [1,6,14,18]
forbid_stop  = [2,9,16,20]

for i in range(len(forbid_start)):
    start = forbid_start[i]
    end = forbid_stop[i]
    data = [j if ((j < start) or (j > end)) else np.nan for j in data]

Output:

[0,
 nan,
 nan,
 3,
 4,
 5,
 nan,
 nan,
 nan,
 nan,
 10,
 11,
 12,
 13,
 nan,
 nan,
 nan,
 17,
 nan,
 nan,
 nan]

CodePudding user response：

Assuming the indices arrays are already sorted (or can be sorted), one way is to iterate over the arrays, build a range object and use the generated indexes to assign None (or whatever other sentinel value you want):

data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
forbid_start = [1, 6, 14, 18]
forbid_stop = [2, 9, 16, 20]

for start, stop in zip(forbid_start, forbid_stop):
    for i in range(start, stop   1):
        data[i] = None

print(data)

Outputs

[0, None, None, 3, 4, 5, None, None, None, None, 10, 11, 12, 13, None, None, None, 17, None, None, None]

CodePudding user response：

Using zip and simple python code.

data = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

forbid_start = [1,6,14,18]
forbid_stop  = [2,9,16,20]

for a,b in zip(forbid_start,forbid_stop):
    data[a:b 1] = [float('nan')]*((b-a) 1)

print(data)

OUTPUT

[0, nan, nan, 3, 4, 5, nan, nan, nan, nan, 10, 11, 12, 13, nan, nan, nan, 17, nan, nan, nan]

CodePudding user response：

filtering a numpy array based on the absence of the value in ranges

You can compute min/max bins with numpy.digitize and mask the values that are in the same bin for the min and max:

data = np.array([[0.3,15],[1.6,24],[2.1,53],[3.8,52],[4.1,13],
                 [5.4,87],[6.5,13],[7.3,62],[8.7,83],[9.6,82],
                 [10.3,38],[11.2,11],[12.6,59],[13.8,22],[14.9,74],
                 [15.4,2]])

# use lists here, not arrays (or convert)
forbid_start = [1.4,7.9,13.0]
forbid_stop  = [3.8,10.2,14.9]

m1 = np.digitize(data[:, 1], forbid_start [np.inf])
m2 = np.digitize(data[:, 1], [0] forbid_stop, right=True)

data[m1 == m2, 1] = np.nan

output:

array([[ 0.3, 15. ],
       [ 1.6, 24. ],
       [ 2.1, 53. ],
       [ 3.8, 52. ],
       [ 4.1,  nan],
       [ 5.4, 87. ],
       [ 6.5,  nan],
       [ 7.3, 62. ],
       [ 8.7, 83. ],
       [ 9.6, 82. ],
       [10.3, 38. ],
       [11.2, 11. ],
       [12.6, 59. ],
       [13.8, 22. ],
       [14.9, 74. ],
       [15.4,  nan]])

previous answer (indices)

pure python

You can use a set of forbidden indices and a simple list comprehension:

forbid = {x for a,b in zip(forbid_start, forbid_stop) for x in range(a,b 1)}
# {1, 2, 6, 7, 8, 9, 14, 15, 16, 18, 19, 20}

data2 = [float('nan') if i in forbid else v for i,v in enumerate(data)]

output:

[0, nan, nan, 3, 4, 5, nan, nan, nan, nan, 10, 11, 12, 13, nan, nan, nan, 17, nan, nan, nan]

numpy

you can craft an indexer with np.r_ and replace by np.nan

a = np.array(data)
# array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
#        17, 18, 19, 20])

idx = np.r_[tuple(slice(a,b 1) for a,b in zip(forbid_start, forbid_stop))]
# array([ 1,  2,  6,  7,  8,  9, 14, 15, 16, 18, 19, 20])

b = a.astype(float)
b[idx] = np.nan

# array([ 0., nan, nan,  3.,  4.,  5., nan, nan, nan, nan, 10., 11., 12.,
#        13., nan, nan, nan, 17., nan, nan, nan])

CodePudding user response：

Is this what you mean by "doing this with cycles"? Your problem description isn't the clearest, so I'm depending more on comparing input and output values (ie. where nan appears in the desired result):

In [16]: for start, stop in zip(forbid_start, forbid_stop):
    ...:     idx = (start<=data[:,0]) & (data[:,0]<=stop)
    ...:     data2[idx,1] = np.nan
    ...: 
In [17]: data2
Out[17]: 
array([[ 0.3, 15. ],
       [ 1.6,  nan],
       [ 2.1,  nan],
       [ 3.8,  nan],
       [ 4.1, 13. ],
       [ 5.4, 87. ],
       [ 6.5, 13. ],
       [ 7.3, 62. ],
       [ 8.7,  nan],
       [ 9.6,  nan],
       [10.3, 38. ],
       [11.2, 11. ],
       [12.6, 59. ],
       [13.8,  nan],
       [14.9,  nan],
       [15.4,  2. ]])

While there is iteration on the start/stop ranges, otherwise it's straight forward numpy. I'm not sure it's worth trying to "vectorize" this any further.

We could collect all the idx first:

In [25]: idx = np.zeros(data.shape[0],bool)
    ...: for start, stop in zip(forbid_start, forbid_stop):
    ...:     idx[(start<=data[:,0])&(data[:,0]<=stop)] = True
    ...: 
In [26]: idx
Out[26]: 
array([False,  True,  True,  True, False, False, False, False,  True,
        True, False, False, False,  True,  True, False])

Another way to get the idx mask

In [40]: x1 = forbid_start<=data[:,[0]]
    ...: x2 = forbid_stop>=data[:,[0]]
    ...: idx = np.any(x1&x2, axis=1)
In [41]: idx
Out[41]: 
array([False,  True,  True,  True, False, False, False, False,  True,
        True, False, False, False,  True,  True, False])