Given a 1D array
a=np.array([ 65, 251, 431])
and another 1D array
used to construct the boundaries.
b=np.array([ 4, 10, 18, 22, 28, 33, 40, 49, 72, 83, 90, 93, 99,
107, 113, 119, 130, 142, 161, 167, 173, 178, 183, 196, 202, 209,
215, 221, 228, 233, 240, 258, 262, 269, 274, 281, 286, 297, 311,
317, 352, 354, 358, 365, 371, 376, 382, 389, 396, 413, 420, 441,
443, 450, 459, 467, 473, 477, 483, 491, 495, 497])
For example, the two point boundries can be of coordinate 4,10
, 4,18
, 4,497
,...,495,497
.
The objective is to find the closest boundary value pair that an integer (e.g., every integer in array a) can reside.
For example the value 65
, the closest boundary that it can reside is 49,72
.
The code below should answer the objective
import numpy as np
import pandas as pd
a=np.array([ 65, 251, 431])
# Assumed `b` is sorted from lowest to highest value and no duplicate values
b=np.array([ 4, 10, 18, 22, 28, 33, 40, 49, 72, 83, 90, 93, 99,
107, 113, 119, 130, 142, 161, 167, 173, 178, 183, 196, 202, 209,
215, 221, 228, 233, 240, 258, 262, 269, 274, 281, 286, 297, 311,
317, 352, 354, 358, 365, 371, 376, 382, 389, 396, 413, 420, 441,
443, 450, 459, 467, 473, 477, 483, 491, 495, 497])
leadB =b[:-1]
trailB=b[1:]
all_val=[]
for dis_a in a:
for l,t in zip(leadB,trailB):
if l < dis_a <= t:
all_val.append({'a':dis_a,'lb':l,'tb':t})
# The final output can be in the form of pandas or numpy array
df=pd.DataFrame(all_val)
But, the above approach rely heavily on the two stage for-loop
. I wonder wheter there is efficient way of doing this either with build-in function of Numpy
or Pandas
.
CodePudding user response:
This seems to be ideal problem for using np.searchsorted
, however there can be two possible solutions depending upon your actual requirements:
- If all the elements in
a
are guaranteed to fall between the boundary points:
i = np.searchsorted(b, a)
df = pd.DataFrame({'a': a, 'lb': b[i - 1], 'tb': b[i]})
- If some elements of
a
do not fall in boundary points then a more general solution would be:
i = np.searchsorted(b, a)
m = ~np.isin(i, [0, len(b)])
df = pd.DataFrame({'a': a})
df.loc[m, 'lb'], df.loc[m, 'tb'] = b[i[m] - 1], b[i[m]]
Result
a lb tb
0 65 49 72
1 251 240 258
2 431 420 441