How to count element in matrix numpy like a groupBy pandas?-CodePudding

I am approaching for the first time in numpy and I need to understand if there is actually a method to count the occurrences of the elements, as in the pandas group by: This is the origin matrix:

 matrix = [[ 0., 143.],
           [ 0., 170.],
           [ 0., 143.],
           [ 1., 269.],
           [ 0., 170.],
           [ 1., 269.],
           [ 0., 155.],
           [ 1., 269.]]

Result output: I should do the groupby for the first column, and the number of occurrences is stored in the last column:

matrix = [[0., 143., 2],
          [0., 170., 2],
          [1., 269., 3],
          [0., 155., 1]]

CodePudding user response：

There is torch.bincount, but it "only supports 1-d non-negative integral inputs".

Anyway, if you are to count elements, due to float precision errors it is better to work with integers.

For the > 1-d case, you can create a custom function:

def count(x):
    unique_values = set(x)
    counts = [
        (
            tuple(v.tolist()),
            (x == v).all(axis=-1).sum().item()
        )
        for v in unique_values
    ]
    return dict(counts)

Which gives counts as a dictionary, similar to standard library Counter, that you can reformat later to suit your needs:

>>> count(matrix.to(int))
{(1, 269): 3, (0, 170): 2, (0, 143): 2, (0, 155): 1}

CodePudding user response：

In numpy you can use numpy.unique and then concatenate the unique elements with their counts.

import numpy as np
m = np.array([[0., 143.],
              [0., 170.],
              [0., 143.],
              [1., 269.],
              [0., 170.],
              [1., 269.],
              [0., 155.],
              [1., 269.]])

rows, counts = np.unique(m, axis=0, return_counts=True)
result = np.concatenate([rows, counts[:, None]], axis=1)

or equivalently

result = np.column_stack(np.unique(m, axis=0, return_counts=True))

which both result in

[[  0.  143.   2.]
 [  0.  155.   1.]
 [  0.  170.   2.]
 [  1.  269.   3.]]

Note that the output order of numpy.unique is not stable. If the order of the output needs to match the order of first occurrences then you can numpy.argsort the result with respect to the indices returned by numpy.unique.

rows, indices, counts = np.unique(m, axis=0, return_index=True, return_counts=True)
result = np.concatenate([rows, counts[:, None]], axis=1)[np.argsort(indices)]

which results in

[[  0.  143.   2.]
 [  0.  170.   2.]
 [  1.  269.   3.]
 [  0.  155.   1.]]

CodePudding user response：

keep it simple:

from collections import Counter

matrix = [[0., 143.],
          [0., 170.],
          [0., 143.],
          [1., 269.],
          [0., 170.],
          [1., 269.],
          [0., 155.],
          [1., 269.]]

counter = Counter(tuple(vector) for vector in matrix)
matrix_with_count = [[*vector, count] for vector, count in counter.most_common()]
print(matrix_with_count)  # [[1.0, 269.0, 3], [0.0, 143.0, 2], [0.0, 170.0, 2], [0.0, 155.0, 1]]