I was tasked with the creation of a dataset to test the functionality of the code we're working on.
The dataset must have a group of tensors that will be used later on in a generative model.
I'm trying to save the tensors to a .pt file, but I'm overwriting the tensors thus creating a file with only one. I've read about torch.utils.data.dataset
but I'm not able to figure out by my own how to use it.
Here is my code:
import torch
import numpy as np
from torch.utils.data import Dataset
#variables that will be used to create the size of the tensors:
num_jets, num_particles, num_features = 1, 30, 3
for i in range(100):
#tensor from a gaussian dist with mean=5,std=1 and shape=size:
tensor = torch.normal(5,1,size=(num_jets, num_particles, num_features))
#We will need the tensors to be of the cpu type
tensor = tensor.cpu()
#save the tensor to 'tensor_dataset.pt'
torch.save(tensor,'tensor_dataset.pt')
#open the recently created .pt file inside a list
tensor_list = torch.load('tensor_dataset.pt')
#prints the list. Just one tensor inside .pt file
print(tensor_list)
CodePudding user response:
Reason: You overwrote tensor x
each time in a loop, therefore you did not get your list, and you only had x at the end.
Solution: you have the size of the tensor, you can initialize a tensor first and iterate through lst_tensors
:
import torch
import numpy as np
from torch.utils.data import Dataset
num_jets, num_particles, num_features = 1, 30, 3
lst_tensors = torch.empty(size=(100,num_jets, num_particles, num_features))
for i in range(100):
lst_tensors[i] = torch.normal(5,1,size=(num_jets, num_particles, num_features))
lst_tensors[i] = lst_tensors[i].cpu()
torch.save(lst_tensors,'tensor_dataset.pt')
tensor_list = torch.load('tensor_dataset.pt')
print(tensor_list.shape) # [100,1,30,3]