So I have the following data in a dataframe, altData
.
TimeMS Alt
75 565473000 27.75
113 565473200 27.81
155 565473400 27.79
193 565473600 27.79
229 565473800 27.83
... ... ...
143343 565476800 28.68
143381 565477000 28.70
143419 565477200 28.72
143461 565477400 28.80
143499 565477600 28.85
[3613 rows x 2 columns]
This is the time series data of a drone flight. The index corresponds to the relative place the data was logged, e.g. 565473800 27.38
was the 229th piece of data logged by the drone. I also have data for when the drone crashed which looks like this:
Subsys ECode
59963 12 1
Which means that a crash was logged at index 59963 relative to other data. What I need to do is figure out what the timestamp of the crash is, however I cannot look up index 59963 in altData
because that index does not exist there. I need a way to find the index in altData
which is closest to and greater than 59963, so I can extract the TimeMS
field to find the timestamp of the crash.
My naive solution is to perform
index = 59963
while index not in altData.index:
index = 1
Which works for this single dataframe but since I need to perform this for about 500 of them this seems very inefficient. What should I do?
CodePudding user response:
What about using:
df.loc[59963:].head(1)
output:
TimeMS Alt
143343 565476800 28.68
Or, getting the index:
df.index.to_frame().ge(59963).idxmax().squeeze()
output: 143343
CodePudding user response:
Searchsorted
You can use the following code:
idx = np.searchsorted(df.index, num)
df.index[max(0, idx)]
Example
Dataset is:
df = pd.DataFrame({'Time': range(10), 'Alt': range(10)}, index=range(0,20, 2))
Output:
Time Alt
0 0 0
2 1 1
4 2 2
6 3 3
8 4 4
10 5 5
12 6 6
14 7 7
16 8 8
18 9 9
Then
num = 7
idx = np.searchsorted(df.index, num)
df.index[max(0, idx)]
Output:
8
If you want to find the index below the index number you enter, then change:
df.index[max(0, idx)]
to
df.index[max(0, idx-1)]
Output from the above example would be 6 instead of 8
And just for good measure, here is a function that can handle this all:
def next_index(df, num):
# Returns the next index from num in the df
idx = np.searchsorted(df.index, num)
return df.index[max(0, idx)]