Finding Value between two numbers in pandas dataframe-CodePudding

I have two pandas dataframe "A" and "B". I would like to find out row number from "B" where value of "A" lies in between two numbers of "B" data frame.

Table A

Index	0
0	0.084
1	0.169
2	0.252
3	0.337
4	0.419
5	0.504
6	0.589

Table B

Index	0
0	0.071
1	0.167
2	0.244
3	0.320

In the case of the above tables let's take one example. The First Number from Table "A" is 0.084 it's Actually in between Table B 0 & 1 Index value i.e. 0.071 and 0.167. I am looking out for an output as [0,1] which is basically row numbers of two values.

CodePudding user response：

First inilize emopty array for result: res = [[]] * len(A.iloc[:, 0])

Then we implement nested loop through A and B, and check each value in A is between B values and return the index

The condition return the start index only:

(A.iloc[:, 0][i] > B.iloc[:, 0][j]) & (A.iloc[:, 0][i] < B.iloc[:, 0][j 1])

So I get the value and add 1 to it later and add them to a list:

res[i]=([j , j 1])

The full code:

import pandas as pd

A = [0.084, 0.169, 0.252, 0.337, 0.419, 0.504, 0.589]
B = [0.071, 0.167, 0.244, 0.320]

A = pd.DataFrame(A)
B = pd.DataFrame(B)

res = [[]] * len(A.iloc[:, 0])

for i in range(0, len(A.iloc[:, 0])):
    for j in range(0, len(B.iloc[:, 0])-1):
        if (A.iloc[:, 0][i] > B.iloc[:, 0][j]) & (A.iloc[:, 0][i] < B.iloc[:, 0][j 1]):
            res[i]=([j , j 1])

print(res)

The output:

Note: I assume that B is always sorted in ascending order

CodePudding user response：

I assume that B is sorted in ascending order, and that the bins are non-overlapping. To bin a value, you can first take the difference between that value and boundaries of all available bins. The desired bin is then found at index i where the difference is positive at i but transits to negative at i 1.

def apply_fn(x):
    delta = x.iloc[:,1] - x.iloc[:,3]
    mask = delta.ge(0) & delta.shift(-1).lt(0)  # True where transit happens
    return x[mask]

A = pd.DataFrame([0.084, 0.169, 0.252, 0.337, 0.419, 0.504, 0.589])
B = pd.DataFrame([0.071, 0.167, 0.244, 0.320])
C = A.reset_index().join(B.reset_index(), lsuffix='_A', rsuffix='_B', how='cross')
D = C.groupby('index_A').apply(apply_fn).reset_index(drop=True)[['index_A', 'index_B']]
D['index_B'] = [[i, i 1] for i in D['index_B'].tolist()]
print(D)

Output

    index_A index_B
0   0   [0, 1]
1   1   [1, 2]
2   2   [2, 3]