Home > other >  Find the closest values in a sorted pandas dataframe to values in a list
Find the closest values in a sorted pandas dataframe to values in a list

Time:06-10

I have seen :

How do I find the closest values in a Pandas series to an input number?

I have a pandas DataFrame like this :

idx col1 col2
1 2 56
2 3 22
3 6 12
4 7 7
5 7.5 6
6 9 9
7 10.1 11
8 11 23

and an input list like this :

[ 4, 7.6, 10]

I want to keep the same number of rows as the list length, where the elements in df['col1'] are the closest to the elements in the list such that y output DataFrame is :

idx col1 col2
2 3 22
5 7.5 6
7 10.1 11

What is an efficient solution when dataframe and list get big?

CodePudding user response:

First method is by substraction but have you considered the solution using partitionning in the topic you mentionned?

CodePudding user response:

You can use broadcasting in numpy to obtain the differences and then get the index conaininng the minimum absolute value

a = np.array([4,7.6,10]).reshape(1,-1) #np.array([[4,7.6,10]])
df.iloc[abs(df.col1.to_numpy()[:,None] - a).argmin(0)]

   idx  col1  col2
1    2   3.0    22
4    5   7.5     6
6    7  10.1    11

CodePudding user response:

There's merge_asof for matching sorted data:

pd.merge_asof(pd.DataFrame({'key':inpt}), df, 
              right_on='col1', left_on='key',
              direction='nearest')

Output:

    key  idx  col1  col2
0   4.0    2   3.0    22
1   7.6    5   7.5     6
2  10.0    7  10.1    11
  • Related