Subtracting Two dimensional arrays using numpy broadcasting-CodePudding

I'm new to the numpy in general so this is an easy question however i'm clueless as how to solve it.
i'm trying to implement K nearest neighbor algorithm for classification of a Data set

there are to arrays named new_points and point that respectively have the shape of (30,4) and (120,4) (with 4 being the total number of the properties of each element)
so i'm trying to calculate the distance between each new point and all old points using numpy.broadcasting

def calc_no_loop(new_points, points):
  return np.sum((new_points-points)**2,axis=1)
#doesn't work here is log

ValueError: operands could not be broadcast together with shapes (30,4) (120,4)

however as per rules of broadcasting two array of shapes (30,4) and (120,4) are incompatible so i would appreciate any insight on how to slove this (using .reshape prehaps - not sure)

please note: that i'have already implemented the same function using one and two loops but can't implement it without one

def calc_two_loops(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
for i in range(m):
    for j in range(n):
        
        d[i, j] = np.sum((new_points[i] - points[j])**2)
return d


def calc_one_loop(new_points, points):
m, n = len(new_points), len(points)    
d = np.zeros((m, n))
print(d)
for i in range(m):
    d[i] = np.sum((new_points[i] - points)**2)
return d

CodePudding user response：

Let's create an exapmle smaller in size:

nNew = 3; nOld = 5    # Number of new / old points
# New points
new_points = np.arange(100, 100   nNew * 4).reshape(nNew, 4)
# Old points
points = np.arange(10, 10   nOld * 8, 2).reshape(nOld, 4)

To compute the differences alone, run:

dfr = new_points[:, np.newaxis, :] - points[np.newaxis, :, :]

So far we have differences in each property of each point (every new point with every old point).

The shape of dfr is (3, 5, 4):

first dimension: the number of new point,
second dimension: the number of old point,
third dimension: the difference in each property.

Then, to sum squares of differences by points, run:

d = np.power(dfr, 2).sum(axis=2)

and this is your result.

For my sample data, the result is:

array([[31334, 25926, 21030, 16646, 12774],
       [34230, 28566, 23414, 18774, 14646],
       [37254, 31334, 25926, 21030, 16646]], dtype=int32)

CodePudding user response：

So you have 30 new points, and 120 old points, so if I understand you correctly you want a shape(120,30) array result of distances.

You could do

import numpy as np

points = np.random.random(120*4).reshape(120,4)
new_points = np.random.random(30*4).reshape(30,4)

def calc_no_loop(new_points, points):
    res = np.zeros([len(points[:,0]),len(new_points[:,0])])
    for idx in range(len(points[:,0])):
        res[idx,:] = np.sum((points[idx,:]-new_points)**2,axis=1)
    return np.sqrt(res)

test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)

Which gives

(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
 [0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
 [0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
 ...
 [0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
 [0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
 [1.08515826 0.64626221 0.6898687  ... 0.96882542 1.08075076 0.80144746]]

But from your function name above I get the notion that you do not want a loop? Then you could do this instead:

def calc_no_loop(new_points, points):
    new_points1 = np.repeat(new_points[np.newaxis,...],len(points),axis=0)
    points1 = np.repeat(points[:,np.newaxis,:],len(new_points),axis=1)
    return np.sqrt(np.sum((new_points-points1)**2 ,axis=2))

test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)

which has output:

(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
 [0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
 [0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
 ...
 [0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
 [0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
 [1.08515826 0.64626221 0.6898687  ... 0.96882542 1.08075076 0.80144746]]

i.e. the same result. Note that I added the np.sqrt() into the result which you may have forgotten in your example above.