operands could not be broadcast together with shapes (100,3) (100,) , why?-CodePudding

This is my first question in stackoverlow, and My English is really poor, so I'm grateful to all those who read my poor English and help me^_^

My question is about broadcasting. enter image description here What I want to do is mutiply each row of X by the number in the same row of B……

X is a (100,3) array and XW is a column vector, (100,). Why They can't broadcast?

After I add "XW = XW.reshape((X.shape[0],1))", Then, they can broadcast. Why…… Are there any difference between (100,1) and (100,)?

I think my picture have clearly described my question...My code really long.... I think it's not convenient to watch my code...

Here is the code..

import numpy as np
import matplotlib.pyplot as plt

class MyFirstMachineLeaningAlgorithm():
    def StochasticGradientDescent(self, W, X, count=100, a=0.1):

        n = X.shape[0]
        for i in range(count):  # 学习count次
            gradient = np.zeros(3)
            for j in range(n):
                gradient  = X[j, :] * (1 - 2 * (X[j, :] @ W))

            W = W   a * gradient
            # 修复模长
            W = W / np.sqrt((W @ W))

        return W

    def BatchGraidentDescent(self, W, X, count=100, a=0.1):
        for i in range(count):
            XW = X @ W
            XW = 1 - 2 * XW

            #XW = XW.reshape((X.shape[0],1))
            gradient = X*XW
            gradient = np.sum(gradient,axis = 0)

            W = W   a * gradient
            # 修复模长
            W = W / np.sqrt((W @ W))

    def train(self, count=100):
        self.W = self.BatchGraidentDescent(self.W, self.X, count)

    def draw(self):
        draw_x = np.arange(-120, 120, 0.01)
        draw_y = -self.W[0] / self.W[1] * draw_x
        draw_y = [-self.W[2] / self.W[1]   draw_y[i] for i in range(len(draw_y))]
        plt.plot(draw_x, draw_y)
        plt.show()

    def __init__(self):
        array_size = (50, 2)
        array1 = np.random.randint(50, 100, size=array_size)
        array2 = np.random.randint(-100, -50, size=array_size)
        array = np.vstack((array1, array2))
        column = np.ones(100)
        self.X = np.column_stack((array, column))
        plt.scatter(array[:, 0], array[0:, 1])
        self.W = np.array([1, 2, 3])
        self.W = self.W / np.sqrt((self.W @ self.W))

g = MyFirstMachineLeaningAlgorithm()
g.train()
g.draw()

CodePudding user response：

It's best to post error information with copy-n-paste, not an image. Still the image is better than nothing.

So the error occurs in the last line of this clip:

        XW = X @ W
        XW = 1 - 2 * XW

        #XW = XW.reshape((X.shape[0],1))
        gradient = X*XW

Just from the function definition I can't tell the shape of X and W. Apparently X is 2d (100,n). If W is (n,), then XW will be (100,), with the sum-of-products on the n dimension. Read the np.matmul docs if that isn't clear.

By the rules of broadcasting (look them up), if one array doesn't have as many dimensions as the other, it will add leading dimensions as needed. Thus (100,) can become (1,100). But to avoid ambiguity, it will not add a trailing dimension. You have to provide that yourself. So the last line should become

 gradient = X * XW[:,None]

or the equivalent using XW.reshape(-1,1) or your version.

Because arrays can be 1d (or even 0d), terms like row vector or column vector have limited value. A 1d array can thought of as a row vector in some cases - where this auto-leading dimension applies.

In init,

    self.X = np.column_stack((array, column))
    self.W = np.array([1, 2, 3])

X is (100,3) and W is (3,). X@W is then (100,).

In [45]: X=np.ones((100,3)); W=np.array([1,2,3])
In [46]: (X@W).shape
Out[46]: (100,)
In [47]: X * (1 (X@W)[:,None]);

CodePudding user response：

I have addressed this question before I post it. But I think that may be helpful to others, so I still post it.

XW is derived from X@W, it should be a 100x1 matrix, right? But when the result can be seen as a vector(nx1 or 1xn),the result will be a vector. The shape of vector is (n,) or (,n), and the shape of the matrix is (n,1) or (1,n), that's their differences.

In python, vectors default to row vectors. So XW can't broadcast with X. But after reshaping, it become a (100,1) matrix, then they can be broadcasted.