Ranking (not sorting) rows of 2D matrix lexicographically with numpy-CodePudding

I want to rank the rows of a 2D matrix lexicographically (in descending order) such that they are ranked: first by the left-most column, then second left-most column, and so on, with equal rankings being shared by rows which have the same value across all columns (making it different to a sorting problem).

I can achieve this using a two-stage approach of sorting the rows lexicographically first, then iterating over the array of sorted indices to assign ranks accordingly:

import numpy as np

A = np.array([[100, 50, 3],  #0
              [200, 40, 2],  #1
              [100, 30, 7],  #2
              [300, 40, 4],  #3
              [200, 40, 2],  #4
              [200, 20, 3],  #5
              [100, 10, 6],  #6
              [100, 30, 7]]) #7

sorted_rows = np.lexsort(([-A[:,i] for i in range(A.shape[1]-1,-1,-1)]))

rank = 0
ranks = np.array([0 for _ in range(A.shape[0])])

for i in range(1,len(sorted_rows)):
    if (A[sorted_rows[i],:] != A[sorted_rows[i-1],:]).any():
        rank = rank   1
    ranks[sorted_rows[i]] = rank

print(ranks) gives the output:

[3 1 4 0 1 2 5 4]

which is correct, but I was wondering if there was a better way of doing this in a single step?

CodePudding user response：

Here's one way you can do it, with just a couple steps and no explicit Python loops, but it relies on using a function from SciPy. The idea is to convert each row into a single number, and rank the numbers using scipy.stats.rankdata. The conversion of each row to a single number is accomplished using the idea of a mixed radix number system.

Here's a demonstration in an ipython session. First, your data:

In [44]: A = np.array([[100, 50, 3],  #0
    ...:               [200, 40, 2],  #1
    ...:               [100, 30, 7],  #2
    ...:               [300, 40, 4],  #3
    ...:               [200, 40, 2],  #4
    ...:               [200, 20, 3],  #5
    ...:               [100, 10, 6],  #6
    ...:               [100, 30, 7]]) #7

Get the "bases" of mixed radix coefficients for each column; each value is the maximum of the column to the right plus one, and the last value is set to 1.

In [45]: b = np.concatenate((A[:, 1:].max(axis=0)   1, [1]))

In [46]: b
Out[46]: array([51,  8,  1])

Form the coefficients that will be used to convert the rows of A to the scalar values. This is the cumulative product from right to left of b:

In [47]: c = np.cumprod(b[::-1])[::-1]

In [48]: c
Out[48]: array([408,   8,   1])

Convert each row to its value using the mixed radix representation, and multiply by -1 because you want the results in descending order:

In [49]: v = -A @ c

In [50]: v
array([ -41203,  -81922,  -41047, -122724,  -81922,  -81763,  -40886,
        -41047])

Use scipy.stats.rankdata with method='dense' to compute the ranks (one is subtracted from the result because rankdata starts its ranks at 1):

In [51]: from scipy.stats import rankdata

In [52]: rankdata(v, method='dense') - 1
Out[52]: array([3, 1, 4, 0, 1, 2, 5, 4])

The short version, with just the code:

b = np.concatenate((A[:, 1:].max(axis=0)   1, [1]))
c = np.cumprod(b[::-1])[::-1]
ranks = rankdata(-A @ c, method='dense') - 1

CodePudding user response：

The return_inverse option to np.unique() produces exactly what you want:

ranks = np.unique(-A, axis=0, return_inverse=True)[1]