I'm trying to find out the easiest way to substitute portions of an array, on the basis of another array, in Python.
I've something like the following:
data = np.array([[0.3,15],[1.6,24],[2.1,53],[3.8,52],[4.1,13],
[5.4,87],[6.5,13],[7.3,62],[8.7,83],[9.6,82],
[10.3,38],[11.2,11],[12.6,59],[13.8,22],
[14.9,74],[15.4,2]])
and I want to set to nan
all entries included between certain starts and stops:
forbid_start = np.array([1.4,7.9,13.0])
forbid_stop = np.array([3.8,10.2,14.9])
to get an array like this:
data2 = np.array([[0.3,15],[1.6,nan],[2.1,nan],[3.8,nan],[4.1,13],
[5.4,87],[6.5,13],[7.3,62],[8.7,nan],[9.6,nan],
[10.3,38],[11.2,11],[12.6,59],[13.8,nan],
[14.9,nan],[15.4,2]])
I'm trying with some cycles, but I guess it's not the right way to address the problem... Thanks in advance.
CodePudding user response:
You can loop over the length of forbid_start
and forbid_stop
to get each pair of start/stop as below:
for i in range(len(forbid_start)):
start = forbid_start[i]
end = forbid_stop[i]
You can then use list comprehension to update the list with nans for the values that fall between each start stop pair, like this:
data = [j if ((j < start) or (j > end)) else np.nan for j in data]
Full code:
import numpy as np
data = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
forbid_start = [1,6,14,18]
forbid_stop = [2,9,16,20]
for i in range(len(forbid_start)):
start = forbid_start[i]
end = forbid_stop[i]
data = [j if ((j < start) or (j > end)) else np.nan for j in data]
Output:
[0,
nan,
nan,
3,
4,
5,
nan,
nan,
nan,
nan,
10,
11,
12,
13,
nan,
nan,
nan,
17,
nan,
nan,
nan]
CodePudding user response:
Assuming the indices arrays are already sorted (or can be sorted), one way is to iterate over the arrays, build a range object and use the generated indexes to assign None
(or whatever other sentinel value you want):
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
forbid_start = [1, 6, 14, 18]
forbid_stop = [2, 9, 16, 20]
for start, stop in zip(forbid_start, forbid_stop):
for i in range(start, stop 1):
data[i] = None
print(data)
Outputs
[0, None, None, 3, 4, 5, None, None, None, None, 10, 11, 12, 13, None, None, None, 17, None, None, None]
CodePudding user response:
Using zip
and simple python code.
data = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
forbid_start = [1,6,14,18]
forbid_stop = [2,9,16,20]
for a,b in zip(forbid_start,forbid_stop):
data[a:b 1] = [float('nan')]*((b-a) 1)
print(data)
OUTPUT
[0, nan, nan, 3, 4, 5, nan, nan, nan, nan, 10, 11, 12, 13, nan, nan, nan, 17, nan, nan, nan]
CodePudding user response:
filtering a numpy array based on the absence of the value in ranges
You can compute min/max bins with numpy.digitize
and mask the values that are in the same bin for the min and max:
data = np.array([[0.3,15],[1.6,24],[2.1,53],[3.8,52],[4.1,13],
[5.4,87],[6.5,13],[7.3,62],[8.7,83],[9.6,82],
[10.3,38],[11.2,11],[12.6,59],[13.8,22],[14.9,74],
[15.4,2]])
# use lists here, not arrays (or convert)
forbid_start = [1.4,7.9,13.0]
forbid_stop = [3.8,10.2,14.9]
m1 = np.digitize(data[:, 1], forbid_start [np.inf])
m2 = np.digitize(data[:, 1], [0] forbid_stop, right=True)
data[m1 == m2, 1] = np.nan
output:
array([[ 0.3, 15. ],
[ 1.6, 24. ],
[ 2.1, 53. ],
[ 3.8, 52. ],
[ 4.1, nan],
[ 5.4, 87. ],
[ 6.5, nan],
[ 7.3, 62. ],
[ 8.7, 83. ],
[ 9.6, 82. ],
[10.3, 38. ],
[11.2, 11. ],
[12.6, 59. ],
[13.8, 22. ],
[14.9, 74. ],
[15.4, nan]])
previous answer (indices)
pure python
You can use a set
of forbidden indices and a simple list comprehension:
forbid = {x for a,b in zip(forbid_start, forbid_stop) for x in range(a,b 1)}
# {1, 2, 6, 7, 8, 9, 14, 15, 16, 18, 19, 20}
data2 = [float('nan') if i in forbid else v for i,v in enumerate(data)]
output:
[0, nan, nan, 3, 4, 5, nan, nan, nan, nan, 10, 11, 12, 13, nan, nan, nan, 17, nan, nan, nan]
numpy
you can craft an indexer with np.r_
and replace by np.nan
a = np.array(data)
# array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
# 17, 18, 19, 20])
idx = np.r_[tuple(slice(a,b 1) for a,b in zip(forbid_start, forbid_stop))]
# array([ 1, 2, 6, 7, 8, 9, 14, 15, 16, 18, 19, 20])
b = a.astype(float)
b[idx] = np.nan
# array([ 0., nan, nan, 3., 4., 5., nan, nan, nan, nan, 10., 11., 12.,
# 13., nan, nan, nan, 17., nan, nan, nan])
CodePudding user response:
Is this what you mean by "doing this with cycles"? Your problem description isn't the clearest, so I'm depending more on comparing input and output values (ie. where nan
appears in the desired result):
In [16]: for start, stop in zip(forbid_start, forbid_stop):
...: idx = (start<=data[:,0]) & (data[:,0]<=stop)
...: data2[idx,1] = np.nan
...:
In [17]: data2
Out[17]:
array([[ 0.3, 15. ],
[ 1.6, nan],
[ 2.1, nan],
[ 3.8, nan],
[ 4.1, 13. ],
[ 5.4, 87. ],
[ 6.5, 13. ],
[ 7.3, 62. ],
[ 8.7, nan],
[ 9.6, nan],
[10.3, 38. ],
[11.2, 11. ],
[12.6, 59. ],
[13.8, nan],
[14.9, nan],
[15.4, 2. ]])
While there is iteration on the start/stop ranges, otherwise it's straight forward numpy
. I'm not sure it's worth trying to "vectorize" this any further.
We could collect all the idx
first:
In [25]: idx = np.zeros(data.shape[0],bool)
...: for start, stop in zip(forbid_start, forbid_stop):
...: idx[(start<=data[:,0])&(data[:,0]<=stop)] = True
...:
In [26]: idx
Out[26]:
array([False, True, True, True, False, False, False, False, True,
True, False, False, False, True, True, False])
Another way to get the idx
mask
In [40]: x1 = forbid_start<=data[:,[0]]
...: x2 = forbid_stop>=data[:,[0]]
...: idx = np.any(x1&x2, axis=1)
In [41]: idx
Out[41]:
array([False, True, True, True, False, False, False, False, True,
True, False, False, False, True, True, False])