I have a set of timestamp (arr)
data and list with starts and ends (cuts)
, the purpose is to intercept the data of the timestamp between the start and end and generate a new array. I have tried with two methodes, with np.searchsorted()
and np.argmin()
, but they give the different results. Any explication for this?
Thank you!
Here is my code:
import numpy as np
# Initialization data
arr = np.arange(761.55643, 1525.5704932002686, 1/ 1000)
cuts = [[810.211186646, 899.102014549], [903.520741867, 982.000921478], [985.201032795, 993.400610844],
[998.303881868, 1085.500698357], [1090.200656211, 1168.101925871], [1171.299249968, 1179.611318749],
[1184.610645285, 1271.597569677], [1275.600586067, 1363.696138556], [1368.301122947, 1455.500707533]]
# Function
vector_validity = np.zeros(len(arr))
new_arr_with_argmin = np.zeros(0)
for cut in cuts:
vector_validity[int(np.searchsorted(arr, cut[0])) : int(np.searchsorted(arr, cut[1]))] = 1
print(f"np.searchsorted start: {np.searchsorted(arr, cut[0])}")
print(f"np.argmin start: {np.argmin(abs(arr - cut[0]))}")
print(f"np.searchsorted end: {np.searchsorted(arr, cut[1])}")
print(f"np.argmin end: {np.argmin(abs(arr - cut[1]))}")
new_arr_with_argmin = np.concatenate((new_arr_with_argmin, arr[np.argmin(abs(arr - cut[0])) : np.argmin(abs(arr - cut[1]))]))
new_arr_with_searchsorted = arr[vector_validity == 1]
The result of the print:
> np.searchsorted start: 48655
> np.argmin start: 48655
> np.searchsorted end: 137546
> np.argmin end: 137546
> np.searchsorted start: 141965
> np.argmin start: 141964
> np.searchsorted end: 220445
> np.argmin end: 220444
> np.searchsorted start: 223645
> np.argmin start: 223645
> np.searchsorted end: 231845
> np.argmin end: 231844
> np.searchsorted start: 236748
> np.argmin start: 236747
> np.searchsorted end: 323945
> np.argmin end: 323944
> np.searchsorted start: 328645
> np.argmin start: 328644
> np.searchsorted end: 406546
> np.argmin end: 406545
> np.searchsorted start: 409743
> np.argmin start: 409743
> np.searchsorted end: 418055
> np.argmin end: 418055
> np.searchsorted start: 423055
> np.argmin start: 423054
> np.searchsorted end: 510042
> np.argmin end: 510041
> np.searchsorted start: 514045
> np.argmin start: 514044
> np.searchsorted end: 602140
> np.argmin end: 602140
> np.searchsorted start: 606745
> np.argmin start: 606745
> np.searchsorted end: 693945
> np.argmin end: 693944
So we can find that from interval 2, two methodes give different indexes. Any explication for this result?
CodePudding user response:
The argmin
method finds the index of closest value, which is not what searchsorted
does.
Here's a simple example:
In [130]: a = np.array([1, 2])
For inputs such as v=1.05 and v=1.95 (both between 1 and 2), the position returned by searchsorted(a, v)
is 1:
In [131]: np.searchsorted(a, [1.05, 1.95])
Out[131]: array([1, 1])
Your method based on argmin
does not give the same result for input values that are closer to 1 than 2:
In [137]: np.argmin(abs(a - 1.05))
Out[137]: 0
In [138]: np.argmin(abs(a - 1.5))
Out[138]: 0
In [139]: np.argmin(abs(a - 1.51))
Out[139]: 1
In [140]: np.argmin(abs(a - 1.95))
Out[140]: 1