I have to order an array with values from 0 to 9 that are repeated and obtain the vector initial index. The input array is:
[3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
I would like to obtain the following order:
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8)
Instead of:
array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9])
which is given by:
import numpy as np
a = [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
np.argsort(a)
Is there a way to manipulate this function?
CodePudding user response:
l = [1,2,3,4,5,6,7,8,9,4,3,5]
l_oredered = []
while len(l) != 0:
unique_nums = list(set(l))
unique_nums.sort()
l_oredered.extend(unique_nums)
for num in unique_nums:
l.remove(num)
print(l_oredered)
This will result with:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5]
You can apply the thinking with NumPy or convert the final result into a NumPy array.
CodePudding user response:
For an array with a small number of elements @Mr.O's answer is faster. The code below is faster if there are more than around 100 ints in arr.
import numpy as np
def sort_groups( arr ):
ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
ct[arr == i] = ct[arr == i ].cumsum()
# ct calculates a rank for each int in arr
tosort = ( arr.max() 1 ) * ct arr
# tosort ranks by ct first then a if ct's are equal
return arr[ np.argsort( tosort ) ]
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
sort_groups( a )
# array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])
Breaking the function out to see what's happening:
arr = a
ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
ct[arr == i] = ct[arr == i ].cumsum()
arr, ct
# (array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]),
# array([1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2]))
tosort = ( arr.max() 1 ) * ct arr # Assumes arr is > 0
tosort
# array([13, 11, 12, 10, 14, 15, 16, 17, 21, 20, 19, 25, 23, 29, 22, 27, 26, 24])
arr[ np.argsort( tosort ) ]
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])
CodePudding user response:
Very interesting task! Here is my attempt at to solve the problem
import numpy as np
def groupsort(a: np.ndarray):
uniques, counts = np.unique(a, return_counts=True)
min_count = np.min(counts)
counts -= min_count
n_easy = min_count * len(uniques)
# Pre allocate array
values = np.empty(n_easy counts.sum(), dtype=a.dtype)
# Set easy values
temp = values[:n_easy].reshape(min_count, len(uniques))
temp[:] = uniques
# Set hard values
i = n_easy
while np.any(mask := counts > 0): # Python 3.8 syntax
masksum = mask.sum()
values[i : i masksum] = uniques[mask]
counts -= mask
i = masksum
return values
a = np.array(list(range(4)) * 2 [1, 1, 1, 1, 1, 0, 0, 0, 2, 2])
np.random.shuffle(a)
print(a)
# [3 0 1 0 1 0 2 1 2 1 0 2 1 1 2 0 3 1]
print(groupsort(a))
# [0 1 2 3 0 1 2 3 0 1 2 0 1 2 0 1 1 1]
The idea is to split the problem in two cases. One easy case and one hard case. The easy case is to handle inputs like this: a = [0,1,2,3,0,1,2,3]
, where the counts for each unique value are equal. Then you can simply count the number n
of a specific value (e.g. 0), then just do list(range(max(a))) * n
.
The hard case is to handle inputs such as a = [1,1,1,1,1,0,0,0,2,2]
. Then the idea is to get the counts of each value, in this case counts = [3,5,2,0]
. Then do:
rest = []
i = 0
while any(counts):
if counts[i]:
rest.append(i)
counts[i] -= 1
i = (i 1) % 3
In my solution you see I have combined the two solutions, since you can infer the input for the hard case by solving the easy case first.
CodePudding user response:
IIUC, you want to have a duplicated, sorted, array.
Remove the duplicated values using numpy.unique
, sort, and tile
to the expected size:
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
b = np.unique(a)
b = np.tile(b, len(a)//len(b))
output:
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])