Home > Software engineering >  Finding all the rows with the approximate values that match a condition in a dataframe
Finding all the rows with the approximate values that match a condition in a dataframe

Time:09-09

I have a panda's dataframe that is something like this

el1 el2  x     el3 
 n   m   6     f
 n2  m2  7     f2
....
 n10 m10 19.3  f10
n11  m11  21    f11
....

The el1, el2, and el3 do not matter at all. I want to find the row with the X nearest to x=20

so I do

min_index=df['x'].sub(20).abs().idxmin()

which gives me the index where x=19.3 is

So far so good

Now, the real problem

Imagine I have a dataframe where x goes from 6 to 31 and then again from 7 to 45 and then again from 2 to 27

I want a way to find a list of indexes [idx1, idx2...idxn] (in the above example n would be 3) with the most approximate values to say for example 20.

How can I do this with pandas and python?

EDIT: An example of the df would be

el1 el2  x     el3 
 n   m   6     f
 n2  m2  7     f2
 ....
 n10 m10 19.3  f10
 n11  m11  21    f11
 ....
 n20 m20   31    f20
 n21 m21   7     f21
 n22  m22  8.1   f22
 ....
 n29   m29  19.8  f29
 n30   m30  21    f30
 ...
n35   m35   45    f35
n36   m36   2     f36
n37   m37   3     f37
....
n45  m45    19.9  f45
n46  m46    22    f46
...
n50  m50   27    f50

The rows I want are the ones where x=19.3 x=19.8 x=x=19.9

CodePudding user response:

Assuming you have consecutive stretches of increasing values and want to find the closest to 20 for each:

group = df['x'].diff().lt(0).cumsum()

out = df.loc[df['x'].sub(20).abs().groupby(group).idxmin()]

example input:

import numpy as np
df = pd.DataFrame({'x': np.r_[np.linspace(6,31,5),
                              np.linspace(7,45,5),
                              np.linspace(2,27,5),
                             ],
                   'el1': '-'
                  })

output:

        x el1
2   18.50   -
6   16.50   -
13  20.75   -
  • Related