Home > Blockchain >  Efficient method to find nearest datetime's for large dataframes
Efficient method to find nearest datetime's for large dataframes

Time:03-20

I have a pandas dataframe with two columns, both are datetime instances. The first column is made of measurement timings and the second column is the sum of the first column with a constant offset. E.g assuming a constant offset of 1 gives:

index Measurement_time offset_time
0 0.1 1.2
1 0.5 1.5
2 1.2 2.2
3 2.4 3.4

I would like to find the index of each measurement_time that closest matches the offset_time with the condition that the measurement_time must be smaller than or equal to the offset_time. The solution to the given example would therefore be:

index = [2, 2, 2, 3]

I have tried using get_loc and making a mask but because my dataframe is large, these solutions are too inefficient.

Any help would be greatly appreciated!

CodePudding user response:

Let us use np.searchsorted to find the indices of closest matches

s = df['Measurement_time'].sort_values()
np.searchsorted(s, df['offset_time'], side='right') - 1

Result:

array([2, 2, 2, 3], dtype=int64)

Note: You may skip the .sort_values part if your dataframe is already sorted on the column Measurement_time

  • Related