Python (and programming) newbie here. I wrote a code that goes through two arrays (x and y; x is linear time and y are measurements) to delete "events" that are smaller than a certain threshold. Each one is an array with shape (12000000,). Start and Stop are arrays of same shape ~(600,) which contain the time of beginning and end of each event, respectively. Threshold is a float.
The code works but it is very slow. I am not sure if it's because of the use of np.where or if it's because I am having to loop through np.nanmax.
Any tips on how to make this run faster?
def event_threshold (x, y, start, stop, threshold):
result_start = []
result_stop = []
for i_start,i_stop in zip(start,stop):
start_x = np.where(x == i_start)[0][0]
stop_x = np.where(x == i_stop)[0][0]
if threshold >= 0:
if np.nanmax(y[start_x:stop_x]) >= threshold:
#Add elements if cross positive threshold
result_start = np.append(result_start, i_start)
result_stop = np.append(result_stop, i_stop)
else: #if threshold is negative
if np.nanmin(y[start_x:stop_x]) <= threshold:
#Add elements if cross negative threshold
result_start = np.append(result_start, i_start)
result_stop = np.append(result_stop, i_stop)
return result_start, result_stop
CodePudding user response:
Generally, you should replace for-loops by list comprehensions whenever possible to speed up your code.
I have updated the list comprehension solution to non back-to-back indexes based on your comment. See the code example below. Speeds up the computation by more than x10 in my case. Let me know, if that is what you where looking for.
Cheers
import time
import numpy as np
def event_threshold (x, y, start, stop, threshold):
result_start = []
result_stop = []
for i_start,i_stop in zip(start,stop):
start_x = np.where(x == i_start)[0][0]
stop_x = np.where(x == i_stop)[0][0]
if threshold >= 0:
if np.nanmax(y[start_x:stop_x]) >= threshold:
#Add elements if cross positive threshold
result_start = np.append(result_start, i_start)
result_stop = np.append(result_stop, i_stop)
else: #if threshold is negative
if np.nanmin(y[start_x:stop_x]) <= threshold:
#Add elements if cross negative threshold
result_start = np.append(result_start, i_start)
result_stop = np.append(result_stop, i_stop)
return result_start, result_stop
# create minimal example
n=100001
start_width = 100
x = np.array([i for i in range(n)])
y = np.random.rand(n)
start = np.array([i for i in range(0,n-1,start_width)])
stop = start [int(i) for i in np.random.uniform(1,start_width,len(start))]
threshold=0.95
# timed result event threshold function
t_0 = time.time()
result_stop_ = event_threshold(x,y,start,stop,threshold)
print(f'time elapsed (s): {time.time()-t_0}')
# timed result list comprehension
t_0 = time.time()
result_stop = [(start[i], stop[i]) for i in range(len(stop)) if np.nanmax(abs(y[start[i]:stop[i]])) >= threshold]
result_stop = np.array(result_stop)
print(f'time elapsed (s): {time.time()-t_0}')
PS: note though, that the result is a np.array and not a tuple