Home > Back-end >  How to get the data at the index closest to a certain value in Pandas
How to get the data at the index closest to a certain value in Pandas

Time:06-08

So I have the following data in a dataframe, altData.

        TimeMS     Alt
75      565473000  27.75
113     565473200  27.81
155     565473400  27.79
193     565473600  27.79
229     565473800  27.83
...           ...    ...
143343  565476800  28.68
143381  565477000  28.70
143419  565477200  28.72
143461  565477400  28.80
143499  565477600  28.85

[3613 rows x 2 columns]

This is the time series data of a drone flight. The index corresponds to the relative place the data was logged, e.g. 565473800 27.38 was the 229th piece of data logged by the drone. I also have data for when the drone crashed which looks like this:

           Subsys  ECode
59963      12      1

Which means that a crash was logged at index 59963 relative to other data. What I need to do is figure out what the timestamp of the crash is, however I cannot look up index 59963 in altData because that index does not exist there. I need a way to find the index in altData which is closest to and greater than 59963, so I can extract the TimeMS field to find the timestamp of the crash.

My naive solution is to perform

index = 59963
while index not in altData.index:
    index  = 1 

Which works for this single dataframe but since I need to perform this for about 500 of them this seems very inefficient. What should I do?

CodePudding user response:

What about using:

df.loc[59963:].head(1)

output:

           TimeMS    Alt
143343  565476800  28.68

Or, getting the index:

df.index.to_frame().ge(59963).idxmax().squeeze()

output: 143343

CodePudding user response:

Searchsorted

You can use the following code:

idx = np.searchsorted(df.index, num)

df.index[max(0, idx)]

Example

Dataset is:

df = pd.DataFrame({'Time': range(10), 'Alt': range(10)}, index=range(0,20, 2))

Output:

    Time  Alt
0      0    0
2      1    1
4      2    2
6      3    3
8      4    4
10     5    5
12     6    6
14     7    7
16     8    8
18     9    9

Then

num = 7
idx = np.searchsorted(df.index, num)
df.index[max(0, idx)]

Output:

8

If you want to find the index below the index number you enter, then change:

df.index[max(0, idx)]

to

df.index[max(0, idx-1)]

Output from the above example would be 6 instead of 8

And just for good measure, here is a function that can handle this all:

def next_index(df, num):
    # Returns the next index from num in the df
    idx = np.searchsorted(df.index, num)
    return df.index[max(0, idx)]
  • Related