I want to rank the rows of a 2D matrix lexicographically (in descending order) such that they are ranked: first by the left-most column, then second left-most column, and so on, with equal rankings being shared by rows which have the same value across all columns (making it different to a sorting problem).
I can achieve this using a two-stage approach of sorting the rows lexicographically first, then iterating over the array of sorted indices to assign ranks accordingly:
import numpy as np
A = np.array([[100, 50, 3], #0
[200, 40, 2], #1
[100, 30, 7], #2
[300, 40, 4], #3
[200, 40, 2], #4
[200, 20, 3], #5
[100, 10, 6], #6
[100, 30, 7]]) #7
sorted_rows = np.lexsort(([-A[:,i] for i in range(A.shape[1]-1,-1,-1)]))
rank = 0
ranks = np.array([0 for _ in range(A.shape[0])])
for i in range(1,len(sorted_rows)):
if (A[sorted_rows[i],:] != A[sorted_rows[i-1],:]).any():
rank = rank 1
ranks[sorted_rows[i]] = rank
print(ranks)
gives the output:
[3 1 4 0 1 2 5 4]
which is correct, but I was wondering if there was a better way of doing this in a single step?
CodePudding user response:
Here's one way you can do it, with just a couple steps and no explicit Python loops, but it relies on using a function from SciPy. The idea is to convert each row into a single number, and rank the numbers using scipy.stats.rankdata
. The conversion of each row to a single number is accomplished using the idea of a mixed radix number system.
Here's a demonstration in an ipython session. First, your data:
In [44]: A = np.array([[100, 50, 3], #0
...: [200, 40, 2], #1
...: [100, 30, 7], #2
...: [300, 40, 4], #3
...: [200, 40, 2], #4
...: [200, 20, 3], #5
...: [100, 10, 6], #6
...: [100, 30, 7]]) #7
Get the "bases" of mixed radix coefficients for each column; each value is the maximum of the column to the right plus one, and the last value is set to 1.
In [45]: b = np.concatenate((A[:, 1:].max(axis=0) 1, [1]))
In [46]: b
Out[46]: array([51, 8, 1])
Form the coefficients that will be used to convert the rows of A
to the scalar values. This is the cumulative product from right to left of b
:
In [47]: c = np.cumprod(b[::-1])[::-1]
In [48]: c
Out[48]: array([408, 8, 1])
Convert each row to its value using the mixed radix representation, and multiply by -1 because you want the results in descending order:
In [49]: v = -A @ c
In [50]: v
array([ -41203, -81922, -41047, -122724, -81922, -81763, -40886,
-41047])
Use scipy.stats.rankdata
with method='dense'
to compute the ranks (one is subtracted from the result because rankdata
starts its ranks at 1):
In [51]: from scipy.stats import rankdata
In [52]: rankdata(v, method='dense') - 1
Out[52]: array([3, 1, 4, 0, 1, 2, 5, 4])
The short version, with just the code:
b = np.concatenate((A[:, 1:].max(axis=0) 1, [1]))
c = np.cumprod(b[::-1])[::-1]
ranks = rankdata(-A @ c, method='dense') - 1
CodePudding user response:
The return_inverse
option to np.unique()
produces exactly what you want:
ranks = np.unique(-A, axis=0, return_inverse=True)[1]