I have two data frames df1
and df2
.
df1
has two columns. I'd like to find the index idx
of row
from df1
, that contains in col1
the minimum distance from the value of df2 col1
.
Next, I want to find the max value of df1 col2
from neighborhood [idx-10, idx 10]
. Then attach that value to df2
to corresponding row for which the min distance was found.
Here is an example of the two data frames:
df1 = pd.DataFrame({"A": [1,1.5,2.5,3,3.5,3.95,4.55,5.1, 5.54, 6.02, 6.57],
"B": [0.25, 0.6, 1.34, 0.75, 0.16, 1.04, 0.99, 0.23, 0.95, 0.65,0.22]})
A B
0 1.00 0.25
1 1.50 0.60
2 2.50 1.34
3 3.00 0.75
4 3.50 0.16
5 3.95 1.04
6 4.55 0.99
7 5.10 0.23
8 5.54 0.95
9 6.02 0.65
10 6.57 0.22
df2 = pd.DataFrame({"A1": [1.06, 2.34, 3.70, 6, 7.2], "C": ["pink", "blue", "orange", "pink", "orange"]})
A1 C
0 1.06 pink
1 2.34 blue
2 3.70 orange
3 6.00 pink
4 7.20 orange
For the convenience sake, I'll choose the interval [idx-2;idx 2]
. Here is the desired result:
df3 = pd.DataFrame({"A1":[1.06,2.34, 3.70, 6, 7.2], "B": [1.34, 1.34, 1.04, 0.95, 0.95], "C": ["pink", "blue", "orange",
"pink", "orange"]})
A1 B C
0 1.06 1.34 pink
1 2.34 1.34 blue
2 3.70 1.04 orange
3 6.00 0.95 pink
4 7.20 0.95 orange
I know that I could merge these data frames, but the problem is that one of them, df1
, is very large, so it would take ridiculous amount of memory to compute all possible distances of columns A
and A1
. So, maybe there is something more efficient than that?
CodePudding user response:
Assuming df1
is already sorted on column A
, Lets use searchsorted
to get the indices of closest values in A
corresponding to the values from A1
, then make sure to clip the indices where the index value equal to length of df1
. Now, do a centered rolling max on df1.B
, finally use the closest indices to get the corresponding maximum values within a interval
i = np.searchsorted(df1['A'], df2['A1'], side='right')
i[i == len(df1)] -= 1
w = 2
r = df1['B'].rolling(2 * w 1, center=True, min_periods=1).max()
df2['B'] = r.iloc[i].tolist()
A1 C B
0 1.06 pink 1.34
1 2.34 blue 1.34
2 3.70 orange 1.04
3 6.00 pink 0.95
4 7.20 orange 0.95