I've been told that double for loops are slow in Python. However, while digging into the implementation of Scipy.spatial.distance.cdist(), I found that it's just a double for loop and not some clever numpy trick. And yet, many answers (e.g. https://stackoverflow.com/a/62416370) recommend using cdist()
.
My question is: why is the function still so fast if it's just naively iterating over all elements in O(n^2)?
CodePudding user response:
Function with two for loops that you are pointing to, _cdist_callable()
, is not used in most cases. When you read cdist()
function you can see that _cdist_callable()
is run for example when you provide your own metric function.
In typical scenario, when you provide metric in form of a string: euclidean
, chebyshev
, cityblock
, etc., C-optimized functions are being used instead.
And "handles" to those C-optimized functions for those metrics are listed here, in the same file.