Home > database >  Slice 3d-tensor-based dataset into smaller tensor lengths
Slice 3d-tensor-based dataset into smaller tensor lengths

Time:06-26

I have a dataset for training networks, formed out of two tensors, my features and my labels. The shape of my demonstration set is [351, 4, 34] for features, and [351] for labels.

Now, I would like to re-shape the dataset into chunks of size k (ideally while loading data with DataLoader), to obtain a new demonstration set for features of shape [351 * n, 4, k] and the corresponding label shape [351 * n], with n = floor(34 / k). The main aim is to reduce the length of each feature, to decrease the size of my network afterwards.

As written example: Starting from

t = [[1, 2, 3, 4], 
     [5, 6, 7, 8]]

i.e. a [2, 4]-tensor, with

l = [1, 0]

as labels, I would like to be able to go to (with k = 2)

t = [[1, 2], 
     [3, 4], 
     [5, 6], 
     [7, 8]]
l = [1, 1, 0, 0]

or to (with k = 3)

t = [[1, 2, 3], 
     [5, 6, 7]]
l = [1, 0]

I found some solutions for reshaping one of the tensors (by using variations of split()), but then I would have to transfer that to my other tensor, too, and therefore I'd prefer solutions inside my DataLoader instead.

Is that possible?

CodePudding user response:

You can reshape the input to the desired shape (first dimension is n times longer) while the label can be repeated with torch.repeat_interleave.

def split(x, y, k=2):
    n = floor(x.size(1) / k)
    x_ = x.reshape(len(x)*n, -1)[:,:k]
    y_ = y.repeat_interleave(len(x_)//len(y))
    return x_, y_

You can test it like so:

>>> split(t, l, k=2)
(tensor([[1, 2],
         [3, 4],
         [5, 6],
         [7, 8]]), tensor([1, 1, 0, 0]))

>>> split(t, l, k=3)
(tensor([[1, 2, 3],
         [5, 6, 7]]), tensor([1, 0]))

I recommend doing this kind of processing in your dataset class.

  • Related