Find most common value in numpy 2d array rows, otherwise return maximum-CodePudding

I have an array like this

Nbank = np.array([[2, 3, 1],
                  [1, 2, 2],
                  [3, 2, 1],
                  [3, 2, 1],
                  [2, 3, 2],
                  [2, 2, 3],
                  [1, 1, 3],
                  [2, 1, 1],
                  [2, 2, 3],
                  [1, 1, 1],
                  [2, 1, 1],
                  [2, 3, 1],
                  [1, 2, 1]])

I want to return an array with only one column. The condition is to return the most common value in each row; if multiple values have the same number of occurrences, just return the maximum of them.

I used this code

most_f = np.array([np.bincount(row).argmax() for row in Nbank])

if multiple values have the same number of occurrences, it returns the first item instead of the maximum. how can I work this around?

CodePudding user response：

You could use a Counter after sorting in descending order by row. There's a most_common that will return what you want. Since it's sorted already, the first element is always either the largest or the most frequent.

import numpy as np
from collections import Counter
Nbank = np.array([[2, 3, 1],
                  [1, 2, 2],
                  [3, 2, 1],
                  [3, 2, 1],
                  [2, 3, 2],
                  [2, 2, 3],
                  [1, 1, 3],
                  [2, 1, 1],
                  [2, 2, 3],
                  [1, 1, 1],
                  [2, 1, 1],
                  [2, 3, 1],
                  [1, 2, 1]])


np.array([Counter(sorted(row, reverse=True)).most_common(1)[0][0] for row in Nbank])

Output

array([3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1])

CodePudding user response：

I believe this will solve the problem. You could probable make it into a one liner with some fancy list comprehension, but I don't think that would be worth while.

most_f = []
for n in Nbank: #iterate over elements
    counts = np.bincount(n) #count the number of elements of each value
    most_f.append(np.argwhere(counts == np.max(counts))[-1][0]) #append the last and highest

CodePudding user response：

You can cheat a little bit and reverse each row in order to make np.argmax return indice of the rightmost occurence which corresponds to the largest item:

N = np.max(arr)
>>> [N - np.argmax(np.bincount(row, minlength=N 1)[::-1]) for row in Nbank]
[3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1]

You might also like to avoid loops which is definitely adviseable if you want to take full advantages of numpy. Unfortunately np.bincount is not supported for 2D arrays but you can do it manually:

N, M = arr.shape[0], np.max(arr) 1
bincount_2D = np.zeros(shape=(N, M), dtype=int)
advanced_indexing = np.repeat(np.arange(N), arr.shape[1]), arr.ravel()
np.add.at(bincount_2D, advanced_indexing, 1)
>>> bincount_2D
array([[0, 1, 1, 1],
       [0, 1, 2, 0],
       [0, 1, 1, 1],
       [0, 1, 1, 1],
       [0, 0, 2, 1],
       [0, 0, 2, 1],
       [0, 2, 0, 1],
       [0, 2, 1, 0],
       [0, 0, 2, 1],
       [0, 3, 0, 0],
       [0, 2, 1, 0],
       [0, 1, 1, 1],
       [0, 2, 1, 0]])

And then repeat the process for all the rows simultaneously:

>>> M -1 - np.argmax(bincount_2D[:,::-1], axis=1)
array([3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1], dtype=int64)

CodePudding user response：

This can give the max value present in the array

Nbank.max()