I have a dataset for training networks, formed out of two tensors, my features and my labels. The shape of my demonstration set is [351, 4, 34] for features, and [351] for labels.
Now, I would like to re-shape the dataset into chunks of size k (ideally while loading data with DataLoader), to obtain a new demonstration set for features of shape [351 * n, 4, k] and the corresponding label shape [351 * n], with n = floor(34 / k). The main aim is to reduce the length of each feature, to decrease the size of my network afterwards.
As written example: Starting from
t = [[1, 2, 3, 4],
[5, 6, 7, 8]]
i.e. a [2, 4]
-tensor, with
l = [1, 0]
as labels, I would like to be able to go to (with k = 2)
t = [[1, 2],
[3, 4],
[5, 6],
[7, 8]]
l = [1, 1, 0, 0]
or to (with k = 3)
t = [[1, 2, 3],
[5, 6, 7]]
l = [1, 0]
I found some solutions for reshaping one of the tensors (by using variations of split()
), but then I would have to transfer that to my other tensor, too, and therefore I'd prefer solutions inside my DataLoader instead.
Is that possible?
CodePudding user response:
You can reshape the input to the desired shape (first dimension is n
times longer) while the label can be repeated with torch.repeat_interleave
.
def split(x, y, k=2):
n = floor(x.size(1) / k)
x_ = x.reshape(len(x)*n, -1)[:,:k]
y_ = y.repeat_interleave(len(x_)//len(y))
return x_, y_
You can test it like so:
>>> split(t, l, k=2)
(tensor([[1, 2],
[3, 4],
[5, 6],
[7, 8]]), tensor([1, 1, 0, 0]))
>>> split(t, l, k=3)
(tensor([[1, 2, 3],
[5, 6, 7]]), tensor([1, 0]))
I recommend doing this kind of processing in your dataset class.