I'd like to compute a matrix element (i, j) of which is distance between i-th and j-th points in python. A naive way is to do the following and it takes 0.43 sec to make a single matrix. Do you have any idea to speed up this code?
For me, it is ok to use widely-used packages such as scipy, scikit-learn.
import numpy as np
import time
def compute_distance_matrix(points: np.ndarray):
assert points.ndim == 2
n_point, n_dim = points.shape
squared_dist_matrix = np.zeros((n_point, n_point))
for i, p in enumerate(points):
squared_dist_matrix[:, i] = np.sum((points - p) ** 2, axis=1)
dist_matrix = np.sqrt(squared_dist_matrix)
return dist_matrix
a = np.random.randn(1000, 4)
ts = time.time()
for _ in range(10):
compute_distance_matrix(a)
print("average time {} sec".format(time.time() - ts))
CodePudding user response:
You can use Scipy's cdist, or sklearn's pairwise_distances.
Both pretty fast, e.g.
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cdist, pdist
a = np.random.randn(1000, 4)
D = cdist(a,a)
-or-
D = pairwise_distances(a)
Both about 10x faster than custom code. For me, cdist()
was the fastest, but I am unaware of the implementation details and how different hardware can have an impact.