Find the closest values in a sorted pandas dataframe to values in a list-CodePudding

I have seen :

How do I find the closest values in a Pandas series to an input number?

I have a pandas DataFrame like this :

idx	col1	col2
1	2	56
2	3	22
3	6	12
4	7	7
5	7.5	6
6	9	9
7	10.1	11
8	11	23

and an input list like this :

[ 4, 7.6, 10]

I want to keep the same number of rows as the list length, where the elements in df['col1'] are the closest to the elements in the list such that y output DataFrame is :

idx	col1	col2
2	3	22
5	7.5	6
7	10.1	11

What is an efficient solution when dataframe and list get big?

CodePudding user response：

First method is by substraction but have you considered the solution using partitionning in the topic you mentionned?

CodePudding user response：

You can use broadcasting in numpy to obtain the differences and then get the index conaininng the minimum absolute value

a = np.array([4,7.6,10]).reshape(1,-1) #np.array([[4,7.6,10]])
df.iloc[abs(df.col1.to_numpy()[:,None] - a).argmin(0)]

   idx  col1  col2
1    2   3.0    22
4    5   7.5     6
6    7  10.1    11

CodePudding user response：

There's merge_asof for matching sorted data:

pd.merge_asof(pd.DataFrame({'key':inpt}), df, 
              right_on='col1', left_on='key',
              direction='nearest')

Output:

    key  idx  col1  col2
0   4.0    2   3.0    22
1   7.6    5   7.5     6
2  10.0    7  10.1    11