count the number of occurance of each one hot code-CodePudding

I have a list of numpy arrays (one-hot represantation) like the example bellow, I want to count the number of occurances of each one-hot code.

[0 0 1 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 1 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]

Edit : Expected output :

[1 0 0 0 0 0 0 0 0 0] ==> 1 occurrence
[0 0 1 0 0 0 0 0 0 0] ==> 2 occurrences
[0 1 0 0 0 0 0 0 0 0] ==> 3 occurrences
[0 0 0 0 0 1 0 0 0 0] ==> 1 occurrence
[0 0 0 0 1 0 0 0 0 0] ==> 2 occurrences
[0 0 0 0 0 0 0 0 0 1] ==> 2 occurrences

CodePudding user response：

I think you can get the result you seek:

[1 3 2 1 2 1 0 0 0 2]

indicating the count of occurrences of one hot in that position via a simple column-wise sum using ndarray.sum():

import numpy
data = numpy.array([
    [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
    [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
])
print(numpy.ndarray.sum(data, axis=0))

or more compactly as just:

print(data.sum(axis=0))

both should give you:

[1 3 2 1 2 1 0 0 0 2]

CodePudding user response：

Using the face that each row is 1 hot, you can do the following:

temp = np.array([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0 ,0 ,0 ,1 ,0 ,0 ,0 ,0 ,0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])

converting the one-hot to indices can be done as follows:

temp2 = np.argmax(temp, axis=1)  # array([2, 2, 1, 5, 1, 4, 9, 4, 0, 3, 1, 9])

and then the counting of the occurances can be done using np.histogram. We know that you have 10 possible values, so we use 10 bins as follows:

temp3 = np.histogram(temp2, bins=10, range=(-0.5,9.5))

np.histogram returns a touple where index [0] holds the histogram values and index [1] holds the bins. In your case:

(array([1, 3, 2, 1, 2, 1, 0, 0, 0, 2]),
 array([-0.5,  0.5,  1.5,  2.5,  3.5,  4.5,  5.5,  6.5,  7.5,  8.5,  9.5]))