Home > Enterprise >  Subtracting Two dimensional arrays using numpy broadcasting
Subtracting Two dimensional arrays using numpy broadcasting


I'm new to the numpy in general so this is an easy question however i'm clueless as how to solve it.
i'm trying to implement K nearest neighbor algorithm for classification of a Data set

there are to arrays named new_points and point that respectively have the shape of (30,4) and (120,4) (with 4 being the total number of the properties of each element)
so i'm trying to calculate the distance between each new point and all old points using numpy.broadcasting

def calc_no_loop(new_points, points):
  return np.sum((new_points-points)**2,axis=1)
#doesn't work here is log 

ValueError: operands could not be broadcast together with shapes (30,4) (120,4)

however as per rules of broadcasting two array of shapes (30,4) and (120,4) are incompatible so i would appreciate any insight on how to slove this (using .reshape prehaps - not sure)

please note: that i'have already implemented the same function using one and two loops but can't implement it without one

def calc_two_loops(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
for i in range(m):
    for j in range(n):
        d[i, j] = np.sum((new_points[i] - points[j])**2)
return d

def calc_one_loop(new_points, points):
m, n = len(new_points), len(points)    
d = np.zeros((m, n))
for i in range(m):
    d[i] = np.sum((new_points[i] - points)**2)
return d

CodePudding user response:

Let's create an exapmle smaller in size:

nNew = 3; nOld = 5    # Number of new / old points
# New points
new_points = np.arange(100, 100   nNew * 4).reshape(nNew, 4)
# Old points
points = np.arange(10, 10   nOld * 8, 2).reshape(nOld, 4)

To compute the differences alone, run:

dfr = new_points[:, np.newaxis, :] - points[np.newaxis, :, :]

So far we have differences in each property of each point (every new point with every old point).

The shape of dfr is (3, 5, 4):

  • first dimension: the number of new point,
  • second dimension: the number of old point,
  • third dimension: the difference in each property.

Then, to sum squares of differences by points, run:

d = np.power(dfr, 2).sum(axis=2)

and this is your result.

For my sample data, the result is:

array([[31334, 25926, 21030, 16646, 12774],
       [34230, 28566, 23414, 18774, 14646],
       [37254, 31334, 25926, 21030, 16646]], dtype=int32)

CodePudding user response:

So you have 30 new points, and 120 old points, so if I understand you correctly you want a shape(120,30) array result of distances.

You could do

import numpy as np

points = np.random.random(120*4).reshape(120,4)
new_points = np.random.random(30*4).reshape(30,4)

def calc_no_loop(new_points, points):
    res = np.zeros([len(points[:,0]),len(new_points[:,0])])
    for idx in range(len(points[:,0])):
        res[idx,:] = np.sum((points[idx,:]-new_points)**2,axis=1)
    return np.sqrt(res)

test = calc_no_loop(new_points,points)

Which gives

(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
 [0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
 [0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
 [0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
 [0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
 [1.08515826 0.64626221 0.6898687  ... 0.96882542 1.08075076 0.80144746]]

But from your function name above I get the notion that you do not want a loop? Then you could do this instead:

def calc_no_loop(new_points, points):
    new_points1 = np.repeat(new_points[np.newaxis,...],len(points),axis=0)
    points1 = np.repeat(points[:,np.newaxis,:],len(new_points),axis=1)
    return np.sqrt(np.sum((new_points-points1)**2 ,axis=2))

test = calc_no_loop(new_points,points)

which has output:

(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
 [0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
 [0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
 [0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
 [0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
 [1.08515826 0.64626221 0.6898687  ... 0.96882542 1.08075076 0.80144746]]

i.e. the same result. Note that I added the np.sqrt() into the result which you may have forgotten in your example above.

  • Related