Run values of one dataframe through another and find the index of similar value from dataframe-CodePudding

I have two dataframes both consisting of a 1 column with 62 values each:

Distance_obs = [    
0.0
0.9084
    2.1931
    2.85815
    3.3903
    3.84815
    4.2565
    4.6287
    4.97295
    5.29475
    5.598
    5.8856
    6.15975
    6.4222
    6.67435
    6.9173
    7.152
    7.37925
    7.5997
    7.8139
    8.02235
8.22555
    8.42385
    8.61755
    8.807
    8.99245
    9.17415
    9.35235
    9.5272
    9.6989
    9.86765
    10.0335
    10.1967
    10.3574
    10.5156
    10.6714
    10.825
    10.9765
    11.1259
    11.2732
    11.4187
    11.5622
    11.7041
    11.8442
    11.9827
    12.1197
    12.2552
    12.3891

    12.5216


    12.6527
    12.7825


    12.9109

    13.0381

    13.1641


    13.2889

    13.4126

    13.5351

    13.6565


    13.7768


    13.8961


    14.0144


    14.0733
]

and

Cell_mid = [0.814993
1.96757
2.56418
3.04159
3.45236
3.8187
4.15258
4.46142
4.75013
5.02221
5.28026
5.52624
5.76172
5.98792
6.20588
6.41642
6.62027
6.81802
7.01019
7.19723
7.37952
7.55742
7.73122
7.9012
8.0676
8.23063
8.39049
8.54736
8.70141
8.85277
9.00159
9.14798
9.29207
9.43396
9.57374
9.71152
9.84736
9.98136
10.1136
10.2441
10.373
10.5003
10.626
10.7503
10.8732
10.9947
11.1149
11.2337
11.3514
11.4678
11.5831
11.6972
11.8102
11.9222
12.0331
12.143
12.2519
12.3599
12.4669
12.573
12.6782
12.7826
]

I want to run each element in Distance_obs to run through the values in Cell_mid and find the corresponding index of nearest value which matches the element.

I have been trying using the following:

for i in Distance_obs:
        Required_value = (np.abs(Cell_mid - i)).idxmin()

But I get

error: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')

CodePudding user response：

One way to do this, could be as follows:

Use pd.merge_asof, passing "nearest" to the direction parameter.
Now, from the merged result select column Cell_mid, and use Series.map with a pd.Series where the values and index of its original df (here: df2) are swapped.

df['Cell_mid_index'] = pd.merge_asof(df, df2, 
                                     left_on='Distance_obs', 
                                     right_on='Cell_mid', 
                                     direction='nearest')\
    ['Cell_mid'].map(pd.Series(df2['Cell_mid'].index.values, 
                               index=df2['Cell_mid']))

print(df.head())

   Distance_obs  Cell_mid_index
0       0.00000               0
1       0.90840               0
2       2.19310               1
3       2.85815               3
4       3.39030               4

So, at the intermediate step, we had a merged df like this:

print(pd.merge_asof(df, df2, left_on='Distance_obs', 
                    right_on='Cell_mid', direction='nearest').head())

   Distance_obs  Cell_mid
0       0.00000  0.814993
1       0.90840  0.814993
2       2.19310  1.967570
3       2.85815  3.041590
4       3.39030  3.452360

And then with .map we are retrieving the appropriate index values from df2.

Data used

import pandas as pd

Distance_obs = [0.0, 0.9084, 2.1931, 2.85815, 3.3903, 3.84815, 4.2565, 
                4.6287, 4.97295, 5.29475, 5.598, 5.8856, 6.15975, 6.4222, 
                6.67435, 6.9173, 7.152, 7.37925, 7.5997, 7.8139, 8.02235, 
                8.22555, 8.42385, 8.61755, 8.807, 8.99245, 9.17415, 9.35235, 
                9.5272, 9.6989, 9.86765, 10.0335, 10.1967, 10.3574, 10.5156, 
                10.6714, 10.825, 10.9765, 11.1259, 11.2732, 11.4187, 11.5622, 
                11.7041, 11.8442, 11.9827, 12.1197, 12.2552, 12.3891, 12.5216, 
                12.6527, 12.7825, 12.9109, 13.0381, 13.1641, 13.2889, 13.4126,
                13.5351, 13.6565, 13.7768, 13.8961, 14.0144, 14.0733]

df = pd.DataFrame(Distance_obs, columns=['Distance_obs'])

Cell_mid = [0.814993, 1.96757, 2.56418, 3.04159, 3.45236, 3.8187, 4.15258, 
            4.46142, 4.75013, 5.02221, 5.28026, 5.52624, 5.76172, 5.98792, 
            6.20588, 6.41642, 6.62027, 6.81802, 7.01019, 7.19723, 7.37952, 
            7.55742, 7.73122, 7.9012, 8.0676, 8.23063, 8.39049, 8.54736, 
            8.70141, 8.85277, 9.00159, 9.14798, 9.29207, 9.43396, 9.57374, 
            9.71152, 9.84736, 9.98136, 10.1136, 10.2441, 10.373, 10.5003, 
            10.626, 10.7503, 10.8732, 10.9947, 11.1149, 11.2337, 11.3514, 
            11.4678, 11.5831, 11.6972, 11.8102, 11.9222, 12.0331, 12.143, 
            12.2519, 12.3599, 12.4669, 12.573, 12.6782, 12.7826]

df2 = pd.DataFrame(Cell_mid, columns=['Cell_mid'])