I am trying to do some dense neural network machine learning with python. I have a problem completing the code to compute the outputs from the weight and biases. When I apply operand *
between the weight and the matrix element at a certain index I get the error that ValueError: operands could not be broadcast together with shapes (100,784) (1000,784,1)
. Am I applying bad indexing to the loop or what am I doing wrong please help.
##initialize the training and test arrays
train=np.empty((1000,28,28),dtype='float64')
trainY=np.zeros((1000,10,1))
test=np.empty((10000,28,28), dtype='float64')
testY=np.zeros((10000,10,1))
##reading image data into the training array
#load the images
i=0
for filename in os.listdir('Data/Training1000/'):
y=int(filename[0])
trainY[i,y]=1.0
train[i]=cv2.imread('Data/Training1000/{0}'.format(filename),0)/255.0
i =1
##reading image data into the testing array
i=0
for filename in os.listdir('Data/Test10000'):
y = int(filename[0])
testY[i,y] = 1.0
test[i] = cv2.imread('Data/Test10000/{0}'.format(filename),0)/255.0
i=i 1
##reshape the training and testing arrays
trainX = train.reshape(train.shape[0],train.shape[1]*train.shape[2],1)
testX = test.reshape(test.shape[0],test.shape[1]*test.shape[2],1)
##section to declare the weights and the biases
w1 = np.random.uniform(low=-0.1,high=0.1,size=(numNeuronsLayer1,784))
b1 = np.random.uniform(low=-1,high=1,size=(numNeuronsLayer1,1))
w2 = np.random.uniform(low=-0.1,high=0.1,size=(numNeuronsLayer2,numNeuronsLayer1))
b2 = np.random.uniform(low=-0.1,high=0.1,size=(numNeuronsLayer2,1))
##declare the hidden layers
numNeuronsLayer1=100
numNeuronsLayer2=10
numEpochs=100
##declare a learning rate
learningRate = 0.1;
##do the forward pass on the weights and the biases
for n in range(0,numEpochs):
loss=0
trainX,trainY = shuffle(trainX, trainY)
for i in range(trainX.shape[0]):
##this is where I have a problem, the line below throws the error described above
##my first pass is declared a2
a2=w1*train[i] w2*trainX[i] b1
How do I correctly reference my training variables inside the loop above to get rid of the broadcast error, Thank You.
CodePudding user response:
You are very close, but with a couple of problems. First, you need to be doing matrix multiplication. *
will do element-wise multiplication (i.e., np.array([1,2,3]) * np.array([2,3,4]) = np.array([2,6,12])
. To do matrix multiplication with numpy
you can use the @
operator (i.e., matrix1 @ matrix2
) or use the np.matmul
function.
You other problem is the shape of your inputs. I am not sure why you are adding a 3rd dimension (the 1
at the end of train.reshape(train.shape[0],train.shape[1]*train.shape[2],1)
. You should be fine keeping it as a matrix (change it to train.reshape(train.shape[0],train.shape[1]*train.shape[2])
, change the test.reshape
accordingly.
finally, your inference line is a little off: a2=w1*train[i] w2*trainX[i] b1
You first must calculate a1
before a2
. An important part of matrix multiplication is that inner dimensions must agree (i.e., you cannot multiply matricies of shapes [100,50] and [100, 50] but you can multiply matricies of shapes [100,50] and [50, 60], the resulting shape of the matrix product is the outer indicies of each matrix, in this case [100,60]). As a result of matrix multiplication, you can also get rid of the for loop around training examples. All examples are calculated at the same time. So to calculate a1
, we need to transpose our w1
and have it as the right hand variable.
a1 = ( trainX @ w1.transpose() ) b1.transpose()
then we can calcuate a2
as a function of a1
a2 = ( a1 @ w2.transpose() ) b2.transpose()