I am trying to find the most common ndarray in a list of ndarrays.
I tried using the mostCommon() function but I got this error: TypeError: unhashable type: 'numpy.ndarray
.
Any ideas on how to tackle this problem?
example list:
a = [array([1, 2, 3]),array([1, 10,30,2, 3]),array([1, 2, 3])]
I want it to print the most common ndarray: array([1, 2, 3])
CodePudding user response:
Numpy is really not well suited for such computation as it deals with variable-sized arrays (aka. jagged array). Moreover, the general approach to deal with such problem is to either sort the array or to use a hash-table but Numpy array of different size cannot be natively compared and they are also not trivially hashable.
One trick to deal with such problem is to convert arrays to tuples and then use a dictionary to count the items. Here is the resulting code:
from collections import Counter
a = [np.array([1, 2, 3]), np.array([1, 10,30,2, 3]), np.array([1, 2, 3])]
result = np.asarray(Counter(map(tuple, a)).most_common()[0][0])
# result = array([1, 2, 3])
Note that this is not very efficient for large list, but using a list of small Numpy arrays is actually what prevent the computation to be fast. At least, the solution is simple.