I have a function that is taking up ~95% of my execution time, I use it multiple times and it's making my program incredibly slow. I cant use the built in multiprocessing library because of my reliance on Pycharm: https://youtrack.jetbrains.com/issue/PY-53235.
Here's my core problem:
import numpy as np
import time
import ray # confirmed this library works in my env
def get_matched_time(match: np.ndarray, template: np.ndarray) -> list:
"""
Return an array containing the closest value
in the template array to the input array.
Args:
match : for each value of match,
find the closest value in template.
template : values to match to
Returns:
matched_time : iterable of size match.size containing
values found in template closest to each value in match.
"""
matched_time, matched_index = [], []
for t in match: # for each and every match value
temp = [abs(t - valor) for valor in template] # iterate template value for closest time
matched_index.append(temp.index(min(temp))) #add the index
return [(template[idx]) for idx in matched_index]
(...)
The match
array (or any iterable) can vary between 10 and 12,000 values. The template array is usually upwards of 10,000 values as well, but always larger than match.
if __name__ == "__main__":
start = time.time()
returned_time = get_matched_time
end = time.time()
print(f"Execution time: {end - start} s")
>>>Execution time: 12.573657751083374 s
Just in-case its unclear:
match = [1.1, 2.1, 5.1] # will always be smaller than template
template = [1.0, 2.0, 3.0, 4.0, 5.0]
My desired output is [1.0, 2.0, 5.0]
in any iterable form, order doesn't really matter because I can just sort by value afterwords.
I'm hoping to be able to assign each loop of for t in match
to a process. I have been trying this with ray Ray Parallel Iterators but I don't understand the documentation to implement. If anybody can recommend a more efficient method, or a way to incorporate multiprocessing via ray
or another library, I would much appreciate it.
CodePudding user response:
You can significantly improve the performance of your search by using vectorized numpy calls in place of your iterations. In particular, we can replace this code:
for t in match: # for each and every match value
temp = [abs(t - valor) for valor in template]
matched_index.append(temp.index(min(temp)))
With abundantly faster numpy operations. Rather than searching, we use vectorized maths to find all the absolute differences between match
and template
. Then we find the minima of these absolute values and the argument value of the minima. Finally, return the matches:
def get_matched_time(match: np.ndarray, template: np.ndarray) -> np.array:
match = match.reshape(-1, 1)
a = np.abs(match - template)
mins = np.argmin(a, axis=1)
return np.array([template[mins[i]] for i in range(len(match))])
In this code, I return an array since the inputs are arrays and this make the code interoperable with numpy functions that might be used on the ouputs.
CodePudding user response:
You don't need more parallelism, you need better code.
So, you're using numpy, but still use a Python loop to go through all elements in the array. To make matters worse, you're not even just doing that with a for-loop generator expression, you're actually making a temporary list by encasing it in []
!
That's a big bad idea. You can express the same idea, calculate the distance to my single value, simply as subtraction and absolute value, on the whole input array. That will very likely lead to such a speedup that you could live with your result.
Then you first find the minimum element, then again go searching for the minimum element, instead of just asking numpy to give you the argmin. What a colossal replication of effort! There's better and easier ways to do it.
However, honestly, you're going to a sortable array to find an element, but instead of being any smart about that, you just check all elements, whether or not you could be much smarter about "narrowing down" on the closest value. You're using an exhausting search where a guided walking to the right point would work; don't.
Take your template array, sort it once (not every time you use this function, once), then use e.g. a binary search to find the closest element. That's logarithmic in effort, not multiple times linear like your current code.
If your problem was large numerically (12000 values is not large for a modern computer in a way that would justify making indirect memory accesses), you'd need to think about data structures that represent searchable vector spaces (your vector space is 1D, so that makes the binary search with a heap the equivalent), like k-d trees.
All in all your code reads like you are not realizing numpy has documentation that you can use to find methods like argmin.