I have seen :
How do I find the closest values in a Pandas series to an input number?
I have a pandas DataFrame like this :
idx | col1 | col2 |
---|---|---|
1 | 2 | 56 |
2 | 3 | 22 |
3 | 6 | 12 |
4 | 7 | 7 |
5 | 7.5 | 6 |
6 | 9 | 9 |
7 | 10.1 | 11 |
8 | 11 | 23 |
and an input list like this :
[ 4, 7.6, 10]
I want to keep the same number of rows as the list length, where the elements in df['col1'] are the closest to the elements in the list such that y output DataFrame is :
idx | col1 | col2 |
---|---|---|
2 | 3 | 22 |
5 | 7.5 | 6 |
7 | 10.1 | 11 |
What is an efficient solution when dataframe and list get big?
CodePudding user response:
First method is by substraction but have you considered the solution using partitionning in the topic you mentionned?
CodePudding user response:
You can use broadcasting
in numpy to obtain the differences and then get the index conaininng the minimum absolute value
a = np.array([4,7.6,10]).reshape(1,-1) #np.array([[4,7.6,10]])
df.iloc[abs(df.col1.to_numpy()[:,None] - a).argmin(0)]
idx col1 col2
1 2 3.0 22
4 5 7.5 6
6 7 10.1 11
CodePudding user response:
There's merge_asof
for matching sorted data:
pd.merge_asof(pd.DataFrame({'key':inpt}), df,
right_on='col1', left_on='key',
direction='nearest')
Output:
key idx col1 col2
0 4.0 2 3.0 22
1 7.6 5 7.5 6
2 10.0 7 10.1 11