I have two pandas dataframe "A" and "B". I would like to find out row number from "B" where value of "A" lies in between two numbers of "B" data frame.
Table A
Index | 0 |
---|---|
0 | 0.084 |
1 | 0.169 |
2 | 0.252 |
3 | 0.337 |
4 | 0.419 |
5 | 0.504 |
6 | 0.589 |
Table B
Index | 0 |
---|---|
0 | 0.071 |
1 | 0.167 |
2 | 0.244 |
3 | 0.320 |
In the case of the above tables let's take one example. The First Number from Table "A" is 0.084 it's Actually in between Table B 0 & 1 Index value i.e. 0.071 and 0.167. I am looking out for an output as [0,1] which is basically row numbers of two values.
CodePudding user response:
First inilize emopty array for result: res = [[]] * len(A.iloc[:, 0])
Then we implement nested loop through A and B, and check each value in A is between B values and return the index
The condition return the start index only:
(A.iloc[:, 0][i] > B.iloc[:, 0][j]) & (A.iloc[:, 0][i] < B.iloc[:, 0][j 1])
So I get the value and add 1 to it later and add them to a list:
res[i]=([j , j 1])
The full code:
import pandas as pd
A = [0.084, 0.169, 0.252, 0.337, 0.419, 0.504, 0.589]
B = [0.071, 0.167, 0.244, 0.320]
A = pd.DataFrame(A)
B = pd.DataFrame(B)
res = [[]] * len(A.iloc[:, 0])
for i in range(0, len(A.iloc[:, 0])):
for j in range(0, len(B.iloc[:, 0])-1):
if (A.iloc[:, 0][i] > B.iloc[:, 0][j]) & (A.iloc[:, 0][i] < B.iloc[:, 0][j 1]):
res[i]=([j , j 1])
print(res)
Note: I assume that B is always sorted in ascending order
CodePudding user response:
I assume that B
is sorted in ascending order, and that the bins are non-overlapping. To bin a value, you can first take the difference between that value and boundaries of all available bins. The desired bin is then found at index i
where the difference is positive at i
but transits to negative at i 1
.
def apply_fn(x):
delta = x.iloc[:,1] - x.iloc[:,3]
mask = delta.ge(0) & delta.shift(-1).lt(0) # True where transit happens
return x[mask]
A = pd.DataFrame([0.084, 0.169, 0.252, 0.337, 0.419, 0.504, 0.589])
B = pd.DataFrame([0.071, 0.167, 0.244, 0.320])
C = A.reset_index().join(B.reset_index(), lsuffix='_A', rsuffix='_B', how='cross')
D = C.groupby('index_A').apply(apply_fn).reset_index(drop=True)[['index_A', 'index_B']]
D['index_B'] = [[i, i 1] for i in D['index_B'].tolist()]
print(D)
Output
index_A index_B
0 0 [0, 1]
1 1 [1, 2]
2 2 [2, 3]