I have an array like this
Nbank = np.array([[2, 3, 1],
[1, 2, 2],
[3, 2, 1],
[3, 2, 1],
[2, 3, 2],
[2, 2, 3],
[1, 1, 3],
[2, 1, 1],
[2, 2, 3],
[1, 1, 1],
[2, 1, 1],
[2, 3, 1],
[1, 2, 1]])
I want to return an array with only one column. The condition is to return the most common value in each row; if multiple values have the same number of occurrences, just return the maximum of them.
I used this code
most_f = np.array([np.bincount(row).argmax() for row in Nbank])
if multiple values have the same number of occurrences, it returns the first item instead of the maximum. how can I work this around?
CodePudding user response:
You could use a Counter after sorting in descending order by row. There's a most_common
that will return what you want. Since it's sorted already, the first element is always either the largest or the most frequent.
import numpy as np
from collections import Counter
Nbank = np.array([[2, 3, 1],
[1, 2, 2],
[3, 2, 1],
[3, 2, 1],
[2, 3, 2],
[2, 2, 3],
[1, 1, 3],
[2, 1, 1],
[2, 2, 3],
[1, 1, 1],
[2, 1, 1],
[2, 3, 1],
[1, 2, 1]])
np.array([Counter(sorted(row, reverse=True)).most_common(1)[0][0] for row in Nbank])
Output
array([3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1])
CodePudding user response:
I believe this will solve the problem. You could probable make it into a one liner with some fancy list comprehension, but I don't think that would be worth while.
most_f = []
for n in Nbank: #iterate over elements
counts = np.bincount(n) #count the number of elements of each value
most_f.append(np.argwhere(counts == np.max(counts))[-1][0]) #append the last and highest
CodePudding user response:
You can cheat a little bit and reverse each row in order to make np.argmax
return indice of the rightmost occurence which corresponds to the largest item:
N = np.max(arr)
>>> [N - np.argmax(np.bincount(row, minlength=N 1)[::-1]) for row in Nbank]
[3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1]
You might also like to avoid loops which is definitely adviseable if you want to take full advantages of numpy
. Unfortunately np.bincount
is not supported for 2D arrays but you can do it manually:
N, M = arr.shape[0], np.max(arr) 1
bincount_2D = np.zeros(shape=(N, M), dtype=int)
advanced_indexing = np.repeat(np.arange(N), arr.shape[1]), arr.ravel()
np.add.at(bincount_2D, advanced_indexing, 1)
>>> bincount_2D
array([[0, 1, 1, 1],
[0, 1, 2, 0],
[0, 1, 1, 1],
[0, 1, 1, 1],
[0, 0, 2, 1],
[0, 0, 2, 1],
[0, 2, 0, 1],
[0, 2, 1, 0],
[0, 0, 2, 1],
[0, 3, 0, 0],
[0, 2, 1, 0],
[0, 1, 1, 1],
[0, 2, 1, 0]])
And then repeat the process for all the rows simultaneously:
>>> M -1 - np.argmax(bincount_2D[:,::-1], axis=1)
array([3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1], dtype=int64)
CodePudding user response:
This can give the max value present in the array
Nbank.max()