Slice 3d-tensor-based dataset into smaller tensor lengths-CodePudding

I have a dataset for training networks, formed out of two tensors, my features and my labels. The shape of my demonstration set is [351, 4, 34] for features, and [351] for labels.

Now, I would like to re-shape the dataset into chunks of size k (ideally while loading data with DataLoader), to obtain a new demonstration set for features of shape [351 * n, 4, k] and the corresponding label shape [351 * n], with n = floor(34 / k). The main aim is to reduce the length of each feature, to decrease the size of my network afterwards.

As written example: Starting from

t = [[1, 2, 3, 4], 
     [5, 6, 7, 8]]

i.e. a [2, 4]-tensor, with

l = [1, 0]

as labels, I would like to be able to go to (with k = 2)

t = [[1, 2], 
     [3, 4], 
     [5, 6], 
     [7, 8]]
l = [1, 1, 0, 0]

or to (with k = 3)

t = [[1, 2, 3], 
     [5, 6, 7]]
l = [1, 0]

I found some solutions for reshaping one of the tensors (by using variations of split()), but then I would have to transfer that to my other tensor, too, and therefore I'd prefer solutions inside my DataLoader instead.

Is that possible?

CodePudding user response：

You can reshape the input to the desired shape (first dimension is n times longer) while the label can be repeated with torch.repeat_interleave.

def split(x, y, k=2):
    n = floor(x.size(1) / k)
    x_ = x.reshape(len(x)*n, -1)[:,:k]
    y_ = y.repeat_interleave(len(x_)//len(y))
    return x_, y_

You can test it like so:

>>> split(t, l, k=2)
(tensor([[1, 2],
         [3, 4],
         [5, 6],
         [7, 8]]), tensor([1, 1, 0, 0]))

>>> split(t, l, k=3)
(tensor([[1, 2, 3],
         [5, 6, 7]]), tensor([1, 0]))

I recommend doing this kind of processing in your dataset class.