Simple Python program of threshold detection too slow-CodePudding

Python (and programming) newbie here. I wrote a code that goes through two arrays (x and y; x is linear time and y are measurements) to delete "events" that are smaller than a certain threshold. Each one is an array with shape (12000000,). Start and Stop are arrays of same shape ~(600,) which contain the time of beginning and end of each event, respectively. Threshold is a float.

The code works but it is very slow. I am not sure if it's because of the use of np.where or if it's because I am having to loop through np.nanmax.

Any tips on how to make this run faster?


def event_threshold (x, y, start, stop, threshold):
        
    result_start = []
    result_stop = []
    
    for i_start,i_stop in zip(start,stop): 
        
        start_x = np.where(x == i_start)[0][0]
        stop_x = np.where(x == i_stop)[0][0]
        
        if threshold >= 0:
            if np.nanmax(y[start_x:stop_x]) >= threshold:
                #Add elements if cross positive threshold
                result_start = np.append(result_start, i_start)
                result_stop = np.append(result_stop, i_stop)
        else: #if threshold is negative
            if np.nanmin(y[start_x:stop_x]) <= threshold:
                #Add elements if cross negative threshold
                result_start = np.append(result_start, i_start)
                result_stop = np.append(result_stop, i_stop)
    return result_start, result_stop

CodePudding user response：

Generally, you should replace for-loops by list comprehensions whenever possible to speed up your code.

I have updated the list comprehension solution to non back-to-back indexes based on your comment. See the code example below. Speeds up the computation by more than x10 in my case. Let me know, if that is what you where looking for.

Cheers

import time 
import numpy as np

def event_threshold (x, y, start, stop, threshold):
        
    result_start = []
    result_stop = []
    
    for i_start,i_stop in zip(start,stop): 
        
        start_x = np.where(x == i_start)[0][0]
        stop_x = np.where(x == i_stop)[0][0]
        
        if threshold >= 0:
            if np.nanmax(y[start_x:stop_x]) >= threshold:
                #Add elements if cross positive threshold
                result_start = np.append(result_start, i_start)
                result_stop = np.append(result_stop, i_stop)
        else: #if threshold is negative
            if np.nanmin(y[start_x:stop_x]) <= threshold:
                #Add elements if cross negative threshold
                result_start = np.append(result_start, i_start)
                result_stop = np.append(result_stop, i_stop)
    return result_start, result_stop

# create minimal example
n=100001
start_width = 100

x = np.array([i for i in range(n)])
y = np.random.rand(n)
start = np.array([i for i in range(0,n-1,start_width)])
stop = start   [int(i) for i in np.random.uniform(1,start_width,len(start))] 
threshold=0.95

# timed result event threshold function
t_0 = time.time()
result_stop_ = event_threshold(x,y,start,stop,threshold)
print(f'time elapsed (s): {time.time()-t_0}')

# timed result list comprehension
t_0 = time.time()
result_stop = [(start[i], stop[i]) for i in range(len(stop)) if np.nanmax(abs(y[start[i]:stop[i]])) >= threshold]
result_stop = np.array(result_stop)
print(f'time elapsed (s): {time.time()-t_0}')

PS: note though, that the result is a np.array and not a tuple