Home > database >  Find the closest boundries value pair that an integer can reside with Numpy or Pandas in Python
Find the closest boundries value pair that an integer can reside with Numpy or Pandas in Python

Time:04-30

Given a 1D array

a=np.array([ 65, 251, 431])

and another 1D array used to construct the boundaries.

b=np.array([  4,  10,  18,  22,  28,  33,  40,  49,  72,  83,  90,  93,  99,
              107, 113, 119, 130, 142, 161, 167, 173, 178, 183, 196, 202, 209,
              215, 221, 228, 233, 240, 258, 262, 269, 274, 281, 286, 297, 311,
              317, 352, 354, 358, 365, 371, 376, 382, 389, 396, 413, 420, 441,
              443, 450, 459, 467, 473, 477, 483, 491, 495, 497])

For example, the two point boundries can be of coordinate 4,10 , 4,18, 4,497,...,495,497.

The objective is to find the closest boundary value pair that an integer (e.g., every integer in array a) can reside.

For example the value 65, the closest boundary that it can reside is 49,72.

The code below should answer the objective

import numpy as np
import pandas as pd
a=np.array([ 65, 251, 431])

# Assumed `b` is sorted from lowest to highest value and no duplicate values
b=np.array([  4,  10,  18,  22,  28,  33,  40,  49,  72,  83,  90,  93,  99,
              107, 113, 119, 130, 142, 161, 167, 173, 178, 183, 196, 202, 209,
              215, 221, 228, 233, 240, 258, 262, 269, 274, 281, 286, 297, 311,
              317, 352, 354, 358, 365, 371, 376, 382, 389, 396, 413, 420, 441,
              443, 450, 459, 467, 473, 477, 483, 491, 495, 497])


leadB =b[:-1]
trailB=b[1:]

all_val=[]
for dis_a in a:
    for l,t in zip(leadB,trailB):
        if l < dis_a <= t:
            all_val.append({'a':dis_a,'lb':l,'tb':t})

# The final output can be in the form of pandas or numpy array
df=pd.DataFrame(all_val)

But, the above approach rely heavily on the two stage for-loop. I wonder wheter there is efficient way of doing this either with build-in function of Numpy or Pandas.

CodePudding user response:

This seems to be ideal problem for using np.searchsorted, however there can be two possible solutions depending upon your actual requirements:

  • If all the elements in a are guaranteed to fall between the boundary points:
i = np.searchsorted(b, a)
df = pd.DataFrame({'a': a, 'lb': b[i - 1], 'tb': b[i]})
  • If some elements of a do not fall in boundary points then a more general solution would be:
i = np.searchsorted(b, a)
m = ~np.isin(i, [0, len(b)])

df = pd.DataFrame({'a': a})
df.loc[m, 'lb'], df.loc[m, 'tb'] = b[i[m] - 1], b[i[m]]

Result

     a   lb   tb
0   65   49   72
1  251  240  258
2  431  420  441
  • Related