I've a tensor in pytorch, its first column's value choice is limited, but its second column's value is freely chosen, e.g:
val = torch.tensor([[1,233],
[1,222],
[2,333],
[2,3234],
[2,3242],
[2,3234],
[3,234],
[3,234],
[4,323]])
Now I want to sum all values in the second column if their correspoding first column's values are same, the output should be as following:
output_val=torch.tensor([[1,455],
[2,10043],
[3,468],
[4,323]])
I want to use pytorch's tensor-support APIs to hanlde this task instead of using python's for/while loop programming because I've more than billions of records to do this kind of handling, the for/while looping code will consume more than several days, and any suggestion is welcom. Thanks!
CodePudding user response:
You are looking for index_add_
, where your first column is the index and the second one is src
.
CodePudding user response:
Thanks for suggestions from @Shai and @Alexander-guyer, finally I've got the full solution to fully utilize the pytorch's parallel computing power(with its APIs) to do this kind of handling. The following is my final solution:
Input value tensor is:
val = torch.tensor([[1,233],
[1,222],
[2,333],
[2,3234],
[2,3242],
[2,3234],
[3,234],
[3,234],
[4,323]])
Now get its first and second columns into val0 and val1:
val0=val[:,0]
val1=val[:,1]
Now, use the torch.unique() to get its first column's unique values into uniq_val0, and get the inversal_index into index0:
uniq_val0, index0=torch.unique(val0, return_inverse=True)
zero_sum=torch.zeros(uniq_val0.shape, dtype=torch.int64)
Now, we could index_add_() to get the values' sum we want with the index0 we got from previous step:
output_val1=zero_sum.index_add_(0, index0, val1)
Now, we could stack the uniq_val0 and output_val1 togather, this is what we want:
output_val=torch.stack((uniq_val0, output_val1),-1)
Now, check the value, it's just what we want:
print(output_val)
tensor([[ 1, 455],
[ 2, 10043],
[ 3, 468],
[ 4, 323]])