I'm new to the numpy in general so this is an easy question however i'm clueless as how to solve it.
i'm trying to implement K nearest neighbor algorithm for classification of a Data set
there are to arrays named new_points and point that respectively have the shape of (30,4)
and (120,4) (with 4 being the total number of the properties of each element)
so i'm trying to calculate the distance between each new point and all old points using numpy.broadcasting
def calc_no_loop(new_points, points):
return np.sum((new_points-points)**2,axis=1)
#doesn't work here is log
ValueError: operands could not be broadcast together with shapes (30,4) (120,4)
however as per rules of broadcasting two array of shapes (30,4) and (120,4) are incompatible
so i would appreciate any insight on how to slove this (using .reshape prehaps - not sure)
please note: that i'have already implemented the same function using one and two loops but can't implement it without one
def calc_two_loops(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
for i in range(m):
for j in range(n):
d[i, j] = np.sum((new_points[i] - points[j])**2)
return d
def calc_one_loop(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
print(d)
for i in range(m):
d[i] = np.sum((new_points[i] - points)**2)
return d
CodePudding user response:
Let's create an exapmle smaller in size:
nNew = 3; nOld = 5 # Number of new / old points
# New points
new_points = np.arange(100, 100 nNew * 4).reshape(nNew, 4)
# Old points
points = np.arange(10, 10 nOld * 8, 2).reshape(nOld, 4)
To compute the differences alone, run:
dfr = new_points[:, np.newaxis, :] - points[np.newaxis, :, :]
So far we have differences in each property of each point (every new point with every old point).
The shape of dfr is (3, 5, 4):
- first dimension: the number of new point,
- second dimension: the number of old point,
- third dimension: the difference in each property.
Then, to sum squares of differences by points, run:
d = np.power(dfr, 2).sum(axis=2)
and this is your result.
For my sample data, the result is:
array([[31334, 25926, 21030, 16646, 12774],
[34230, 28566, 23414, 18774, 14646],
[37254, 31334, 25926, 21030, 16646]], dtype=int32)
CodePudding user response:
So you have 30 new points, and 120 old points, so if I understand you correctly you want a shape(120,30) array result of distances.
You could do
import numpy as np
points = np.random.random(120*4).reshape(120,4)
new_points = np.random.random(30*4).reshape(30,4)
def calc_no_loop(new_points, points):
res = np.zeros([len(points[:,0]),len(new_points[:,0])])
for idx in range(len(points[:,0])):
res[idx,:] = np.sum((points[idx,:]-new_points)**2,axis=1)
return np.sqrt(res)
test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)
Which gives
(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
[0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
[0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
...
[0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
[0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
[1.08515826 0.64626221 0.6898687 ... 0.96882542 1.08075076 0.80144746]]
But from your function name above I get the notion that you do not want a loop? Then you could do this instead:
def calc_no_loop(new_points, points):
new_points1 = np.repeat(new_points[np.newaxis,...],len(points),axis=0)
points1 = np.repeat(points[:,np.newaxis,:],len(new_points),axis=1)
return np.sqrt(np.sum((new_points-points1)**2 ,axis=2))
test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)
which has output:
(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
[0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
[0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
...
[0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
[0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
[1.08515826 0.64626221 0.6898687 ... 0.96882542 1.08075076 0.80144746]]
i.e. the same result. Note that I added the np.sqrt() into the result which you may have forgotten in your example above.