I have 2 tensors. The first tensor is 1D (e.g. a tensor of 3 values). The second tensor is 2D, with the first dim as the IDs to first tensor in a one-many relationship (e.g. a tensor with a shape of 6, 2)
# e.g. simple example of dot product
import torch
a = torch.tensor([2, 4, 3])
b = torch.tensor([[0, 2], [0, 3], [0, 1], [1, 4], [2, 3], [2, 1]]) # 1st column is the index to tensor a, 2nd column is the value
output = [(2*2) (2*3) (2*1),(4*4),(3*3) (3*1)]
output = [12, 16, 12]
Current what I have is to find the size of each id in b (e.g. [3,1,2]) then using torch.split to group them into a list of tensors and running a for loop through the groups. It is fine for a small tensor, but when the size of the tensors are in millions, with tens of thousands of arbitrary-sized groups, it became very slow.
Any better solutions?
CodePudding user response:
You can use numpy.bincount
or torch.bincount
to sum the elements of b
by key:
import numpy as np
a = np.array([2,4,3])
b = np.array([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
print( np.bincount(b[:,0], b[:,1]) )
# [6. 4. 4.]
print( a * np.bincount(b[:,0], b[:,1]) )
# [12. 16. 12.]
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
torch.bincount(b[:,0], b[:,1])
# tensor([6., 4., 4.], dtype=torch.float64)
a * torch.bincount(b[:,0], b[:,1])
# tensor([12., 16., 12.], dtype=torch.float64)
References:
- numpy.bincount official documentation;
- torch.bincount official documentation;
- How can I reduce a numpy array based on a key rather than an axis?
CodePudding user response:
Another alternative in pytorch if gradient is needed.
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
output = torch.zeros(a.shape[0], dtype=torch.long).index_add_(0, b[:, 0], b[:, 1]) * a
alternatively, torch.tensor.scatter_add also works.