Efficient method to find nearest datetime's for large dataframes-CodePudding

I have a pandas dataframe with two columns, both are datetime instances. The first column is made of measurement timings and the second column is the sum of the first column with a constant offset. E.g assuming a constant offset of 1 gives:

index	Measurement_time	offset_time
0	0.1	1.2
1	0.5	1.5
2	1.2	2.2
3	2.4	3.4

I would like to find the index of each measurement_time that closest matches the offset_time with the condition that the measurement_time must be smaller than or equal to the offset_time. The solution to the given example would therefore be:

index = [2, 2, 2, 3]

I have tried using get_loc and making a mask but because my dataframe is large, these solutions are too inefficient.

Any help would be greatly appreciated!

CodePudding user response：

Let us use np.searchsorted to find the indices of closest matches

s = df['Measurement_time'].sort_values()
np.searchsorted(s, df['offset_time'], side='right') - 1

Result:

array([2, 2, 2, 3], dtype=int64)

Note: You may skip the .sort_values part if your dataframe is already sorted on the column Measurement_time