How to sum the second column's values according to the first column's value in a pytorch t-CodePudding

I've a tensor in pytorch, its first column's value choice is limited, but its second column's value is freely chosen, e.g:

val = torch.tensor([[1,233],
                    [1,222],
                    [2,333],
                    [2,3234],
                    [2,3242],
                    [2,3234],
                    [3,234],
                    [3,234],
                    [4,323]])

Now I want to sum all values in the second column if their correspoding first column's values are same, the output should be as following:

output_val=torch.tensor([[1,455],
                 [2,10043],
                 [3,468],
                 [4,323]])

I want to use pytorch's tensor-support APIs to hanlde this task instead of using python's for/while loop programming because I've more than billions of records to do this kind of handling, the for/while looping code will consume more than several days, and any suggestion is welcom. Thanks!

CodePudding user response：

You are looking for index_add_, where your first column is the index and the second one is src.

CodePudding user response：

Thanks for suggestions from @Shai and @Alexander-guyer, finally I've got the full solution to fully utilize the pytorch's parallel computing power（with its APIs） to do this kind of handling. The following is my final solution:

Input value tensor is:

val = torch.tensor([[1,233],
                    [1,222],
                    [2,333],
                    [2,3234],
                    [2,3242],
                    [2,3234],
                    [3,234],
                    [3,234],
                    [4,323]])

Now get its first and second columns into val0 and val1:

val0=val[:,0]
val1=val[:,1]

Now, use the torch.unique() to get its first column's unique values into uniq_val0, and get the inversal_index into index0:

uniq_val0, index0=torch.unique(val0, return_inverse=True)
zero_sum=torch.zeros(uniq_val0.shape, dtype=torch.int64)

Now, we could index_add_() to get the values' sum we want with the index0 we got from previous step:

output_val1=zero_sum.index_add_(0, index0, val1)

Now, we could stack the uniq_val0 and output_val1 togather, this is what we want:

output_val=torch.stack((uniq_val0, output_val1),-1)

Now, check the value, it's just what we want:

print(output_val)

tensor([[    1,   455],
        [    2, 10043],
        [    3,   468],
        [    4,   323]])