Numpy to pyTorch: are there different data types?-CodePudding

Question: Can somebody help me to align this two approaches of data generation so that both of them can be used by the nn-model below ? When using appraoch (2) with numpy and torch.from_numpy(x) a run time error occurs ("expected scalar type Float but found Double")

For data generation I have these two approaches:

import torch 
import torch.nn as nn
import numpy as np

def get_training_data_1():
    x = torch.randn(batch_size, n_in)
    y = torch.tensor([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]])   
    return x,y

def get_training_data_2():
    x = np.random.rand(batch_size, n_in)
    y = np.array([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]])
    
    x = torch.from_numpy(x)
    y = torch.from_numpy(y)
    return x,y

n_in, n_h, n_out, batch_size = 2, 5, 1, 10
x, y = get_training_data_2()

With this model I run into problems when using appraoch (2) with numpy and torch.from_numpy(x), while it is OK when using approach (1)

#---- Create a NN-model
model = nn.Sequential( nn.Linear(n_in, n_h),     # hidden layer
                       nn.ReLU(),                # activation layer
                       nn.Linear(n_h, n_out),    # output layer
                       nn.Sigmoid() )            # final 0, 1 rounding

#---- Construct the loss function
criterion = torch.nn.MSELoss()

#---- Construct the optimizer (Stochastic Gradient Descent in this case)
optimizer = torch.optim.SGD(model.parameters(), lr = 0.1)

#---- Gradient Descent
for epoch in range(1501):
    y_pred = model(x)                       # Forward pass: Compute predicted y by passing x to the model
    loss = criterion(y_pred, y)             # Compute and print loss
    if epochP == 0:
        print(epoch, loss.item())
    optimizer.zero_grad()                   # Zero gradients, perform a backward pass, and update the weights.
    loss.backward()                         # perform a backward pass (backpropagation)
    optimizer.step()                        # Update the parameters

CodePudding user response：

The default floating point type in torch is float32 (i.e. single precision). In NumPy the default is float64 (double precision). Try changing get_training_data_2 so that it explicitly sets the data type of the numpy arrays numpy.float32 before converting them to torch tensors:

def get_training_data_2():
    x = np.random.rand(batch_size, n_in).astype(np.float32)
    y = np.array([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]],
                 dtype=np.float32)
    
    x = torch.from_numpy(x)
    y = torch.from_numpy(y)
    return x,y

Note. With the newer NumPy random API, you can generate float32 samples directly instead of casting float64 values to float32.

def get_training_data_2(rng):
    x = rng.random(size=(batch_size, n_in), dtype=np.float32)
    y = np.array([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]],
                 dtype=np.float32)
    
    x = torch.from_numpy(x)
    y = torch.from_numpy(y)
    return x,y


rng = np.random.default_rng()
x, y = get_training_data_2(rng)