Sort array with repeated values-CodePudding

I have to order an array with values from 0 to 9 that are repeated and obtain the vector initial index. The input array is:

[3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]

I would like to obtain the following order:

array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8)

Instead of:

array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9])

which is given by:

import numpy as np
a = [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
np.argsort(a)

Is there a way to manipulate this function?

CodePudding user response：

l = [1,2,3,4,5,6,7,8,9,4,3,5]
l_oredered = []
while len(l) != 0:
    unique_nums = list(set(l))
    unique_nums.sort()
    l_oredered.extend(unique_nums)
    for num in unique_nums:
        l.remove(num)

print(l_oredered)

This will result with:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5]

You can apply the thinking with NumPy or convert the final result into a NumPy array.

CodePudding user response：

For an array with a small number of elements @Mr.O's answer is faster. The code below is faster if there are more than around 100 ints in arr.

import numpy as np

def sort_groups( arr ):
    ct = np.ones( len(arr), dtype = np.int64 )
    for i in set( arr ):
        ct[arr == i] = ct[arr == i ].cumsum()
    # ct calculates a rank for each int in arr
    tosort = ( arr.max()   1 ) * ct   arr 
    # tosort ranks by ct first then a if ct's are equal
    return arr[ np.argsort( tosort ) ]
     
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
sort_groups( a )
# array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

Breaking the function out to see what's happening:

arr = a

ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
    ct[arr == i] = ct[arr == i ].cumsum()

arr, ct
# (array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]),
#  array([1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2]))

tosort = ( arr.max()   1 ) * ct   arr  # Assumes arr is > 0

tosort
# array([13, 11, 12, 10, 14, 15, 16, 17, 21, 20, 19, 25, 23, 29, 22, 27, 26, 24])

arr[ np.argsort( tosort ) ]
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

CodePudding user response：

Very interesting task! Here is my attempt at to solve the problem

import numpy as np


def groupsort(a: np.ndarray):
    uniques, counts = np.unique(a, return_counts=True)
    min_count = np.min(counts)
    counts -= min_count
    n_easy = min_count * len(uniques)

    # Pre allocate array
    values = np.empty(n_easy   counts.sum(), dtype=a.dtype)

    # Set easy values
    temp = values[:n_easy].reshape(min_count, len(uniques))
    temp[:] = uniques

    # Set hard values
    i = n_easy
    while np.any(mask := counts > 0): # Python 3.8 syntax
        masksum = mask.sum()
        values[i : i   masksum] = uniques[mask]
        counts -= mask
        i  = masksum
    return values


a = np.array(list(range(4)) * 2   [1, 1, 1, 1, 1, 0, 0, 0, 2, 2])
np.random.shuffle(a)
print(a)
# [3 0 1 0 1 0 2 1 2 1 0 2 1 1 2 0 3 1]
print(groupsort(a))
# [0 1 2 3 0 1 2 3 0 1 2 0 1 2 0 1 1 1]

The idea is to split the problem in two cases. One easy case and one hard case. The easy case is to handle inputs like this: a = [0,1,2,3,0,1,2,3], where the counts for each unique value are equal. Then you can simply count the number n of a specific value (e.g. 0), then just do list(range(max(a))) * n.

The hard case is to handle inputs such as a = [1,1,1,1,1,0,0,0,2,2]. Then the idea is to get the counts of each value, in this case counts = [3,5,2,0]. Then do:

rest = []
i = 0
while any(counts):
    if counts[i]:
        rest.append(i)
        counts[i] -= 1
    i = (i   1) % 3

In my solution you see I have combined the two solutions, since you can infer the input for the hard case by solving the easy case first.

CodePudding user response：

IIUC, you want to have a duplicated, sorted, array.

Remove the duplicated values using numpy.unique, sort, and tile to the expected size:

a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
b = np.unique(a)
b = np.tile(b, len(a)//len(b))

output:

array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])