Splitting pytorch dataloader into numpy arrays-CodePudding

In principle I'd like to do the opposite of what was done here https://datascience.stackexchange.com/questions/45916/loading-own-train-data-and-labels-in-dataloader-using-pytorch.

I have a Pytorch dataloader train_dataloader with shape (2000,3). I want to store the 3 dataloader columns in 3 separate numpy arrays. (The first column of the dataloader contains the data, the second column contains the labels.)

I managed to do it for the last batch of the train_dataloader (see below), but unfortunately couldn't make it work for the whole train_dataloader.

for X, y, ind in train_dataloader:
    pass

train_X = np.asarray(X, dtype=np.float32)
train_y = np.asarray(y, dtype=np.float32)

Any help would be very much appreciated!

CodePudding user response：

You can collect all the data:

all_X = []
all_y = []
for X, y, ind in train_dataloader:
  all_X.append(X)
  all_y.append(y)
train_X = torch.cat(all_X, dim=0).numpy()
train_y = torch.cat(all_y, dim=0).numpy()