I have a 2d numpy array which I'm trying to return the mode array along axis = 0 (rows). However, I would like to return the most frequent unique row combination. And not the three modes for all three columns which is what scipy stats mode does. The desired output in the example below would be [9,9,9], because thats the most common unique row. Thanks
'''
from scipy import stats
arr1 = np.array([[2,3,4],[2,1,5],[1,2,3],[2,4,4],[2,8,2],[2,3,1],[9,9,9],[9,9,9]])
stats.mode(arr1, axis = 0)
output:
ModeResult(mode=array([[2, 3, 4]]), count=array([[5, 2, 2]]))
'''
CodePudding user response:
you could use the numpy unique funtion and return counts.
unique_arr1, count = np.unique(arr1,axis=0, return_counts=True)
unique_arr1[np.argmax(count)]
output:
array([9, 9, 9])
np.unique return the unique array in sorted order, which means it is guranteed that last one is the maximum. you could simply do:
out = np.unique(arr1,axis=0)[-1]
however, I do not know for what purpose you want to use this but just to mention that you could have all counts just in case you want to verify or account for multiple rows with same counts as well.
Update given additional information that this is for images (which could be big) and most importantly second dim could fit in a int (either each values is uin8 or 16) could fir in int32 or 64. (considering of values of each pixel in uint8):
pixel, count = np.unique(np.dot(arr, np.array([2**16,2**8,1])), return_counts=True)
pixel = pixel[np.argmax(count)]
b,g, r, = np.ndarray((3,), buffer=pixel, dtype=np.uint8)
This could result in a big speedup.