I have data in an array like so:
array([[ 5, 5, 5, 6, 9, 6, 6],
[10, 4, 10, 3, 5, 3, 3],
[10, 3, 10, 4, 5, 3, 4],
[ 9, 6, 8, 8, 10, 6, 9],
[10, 10, 10, 7, 10, 4, 4],
[10, 6, 10, 5, 9, 7, 5],
[ 9, 7, 10, 7, 10, 8, 10],
[ 8, 5, 10, 7, 10, 7, 10],
[ 7, 10, 10, 9, 10, 7, 8]])
I want to sort it by the amount of non-10 values, and I also want to sort it in ascending order for rows, and in descending order of number of 10s:
arr = np.sort(arr, axis=1)
arr = arr[(arr==10).sum(axis=1).argsort()][::-1]
Output:
array([[ 4, 4, 7, 10, 10, 10, 10],
[ 7, 7, 8, 9, 10, 10, 10],
[ 5, 7, 7, 8, 10, 10, 10],
[ 7, 7, 8, 9, 10, 10, 10],
[ 5, 5, 6, 7, 9, 10, 10],
[ 3, 3, 4, 4, 5, 10, 10],
[ 3, 3, 3, 4, 5, 10, 10],
[ 6, 6, 8, 8, 9, 9, 10],
[ 5, 5, 5, 6, 6, 6, 9]])
I want to implement a tie breaker system so that if the amount of 10s the same, it now orders by amount of 9s, then 8s, and so on. Expected output:
array([[ 4, 4, 7, 10, 10, 10, 10],
[ 7, 7, 8, 9, 10, 10, 10],
[ 7, 7, 8, 9, 10, 10, 10],
[ 5, 7, 7, 8, 10, 10, 10],
[ 5, 5, 6, 7, 9, 10, 10],
[ 3, 3, 4, 4, 5, 10, 10],
[ 3, 3, 3, 4, 5, 10, 10],
[ 6, 6, 8, 8, 9, 9, 10],
[ 5, 5, 5, 6, 6, 6, 9]])
CodePudding user response:
You can achieve it with numpy.frompyfunc
.
The basic idea is to construct an array with the same rows, each element of which is a tuple containing the number of 10s, 9s, etc. Then apply numpy.argsort
to this array and get the result.
import numpy as np
arr = np.array([[ 5, 5, 5, 6, 9, 6, 6],
[10, 4, 10, 3, 5, 3, 3],
[10, 3, 10, 4, 5, 3, 4],
[ 9, 6, 8, 8, 10, 6, 9],
[10, 10, 10, 7, 10, 4, 4],
[10, 6, 10, 5, 9, 7, 5],
[ 9, 7, 10, 7, 10, 8, 10],
[ 8, 5, 10, 7, 10, 7, 10],
[ 7, 10, 10, 9, 10, 7, 8]])
arr = np.sort(arr, 1)
keys = sorted(set(arr.ravel()), reverse=True)
def make_tuple(*argv):
return tuple(argv)
ufunc = np.frompyfunc(make_tuple, len(keys), 1)
cnt_array = ufunc(*[(arr == k).sum(1) for k in keys])
result = arr[cnt_array.argsort()[::-1]]
print(result)
# [[ 4 4 7 10 10 10 10]
# [ 7 7 8 9 10 10 10]
# [ 7 7 8 9 10 10 10]
# [ 5 7 7 8 10 10 10]
# [ 5 5 6 7 9 10 10]
# [ 3 3 4 4 5 10 10]
# [ 3 3 3 4 5 10 10]
# [ 6 6 8 8 9 9 10]
# [ 5 5 5 6 6 6 9]]
CodePudding user response:
The simplest way to do that in numpy is to sort by the least important part of the tie-breaker, and progress up to the most important sort criteria.
E.g. if you want to sort by number of 10s, then tie-break by 9s, then tie-break by 8s, then you could do this:
arr = np.sort(arr, axis=1)
arr = arr[(arr==8).sum(axis=1).argsort(kind='stable')]
arr = arr[(arr==9).sum(axis=1).argsort(kind='stable')]
arr = arr[(arr==10).sum(axis=1).argsort(kind='stable')]
arr = arr[::-1]
Keep in mind that you need kind='stable'
on each argsort. This means that if one element appears before another, and they are tied in the current sort criteria, the order will be kept. The default sort method, quicksort, is not stable.