So i am trying to do a function that trains an mlp using PyTorch. My code is as follows :
def mlp_gradient_descent(x,y , model , eta = 1e-6 , nb_iter = 30000) :
loss_descent = []
dtype = torch.float
device = torch.device("cpu")
x = torch.from_numpy(x)
y = torch.from_numpy(y)
params = model.parameters()
learning_rate = eta
for t in range(nb_iter):
y_pred = model(x)
loss = (y_pred - y).pow(2).sum()
print(loss)
if t % 100 == 99:
print(t, loss.item())
loss_descent.append([t, loss.item()])
loss.backward()
with torch.no_grad():
for param in params :
param -= learning_rat*param.grad
for param in params :
param = None
and i m having this error :
mat1 and mat2 must have the same dtype
Note that : The problem comes from the model(x) and x and y are numpy arrays.
Thank you all. And have a great day.
CodePudding user response:
This error message is indicating that the two matrices (mat1
and mat2
) that are being compared or operated on in your code are not of the same data type (dtype
). In Pytorch, data types are important because they determine how the memory is allocated and how the computations are performed.
In the specific line that is causing the error, it is likely that one of the matrices has a dtype of torch.float
(as set in the line dtype = torch.float
) and the other matrix has a different dtype. This could happen if the other matrix was created or loaded from a source that assigns a different dtype by default.
To fix this issue, you can make sure that both matrices have the same dtype by converting one of them using the .to()
method. For example, if you want to convert mat2
to have the same dtype as mat1
, you can use the following line of code: mat2 = mat2.to(dtype)
.
You can also check the dtype of each matrix before doing any operation by using the .dtype
method.
Additionally, in the provided code, it seems that you are trying to update the parameters of the model using the gradient descent rule. However, you are not keeping track of the gradients after the backward pass with the line for param in params : param = None
. This will stop the gradients from being accumulated and the parameters from updating. Instead, you should use the torch.optim
package or zero the gradients with param.grad.zero_()
after the update step.