Selecting leftmost unique elements from multiple lists until given size is reached-CodePudding

I have the following lists:

a = [ 1,  6, 76, 15, 46, 55, 47, 15, 72, 58, ..] # there could be more than 10 elements in each
b = [17, 48, 22,  7, 35, 19, 91, 85, 49, 35, ..]
c = [46,  8, 53, 49, 28, 82, 30, 86, 57,  9, ..]
d = [82, 12, 24, 60, 66, 17, 13, 69, 28, 99, ..]
e = [ 1, 53, 17, 82, 21, 20, 88, 10, 82, 41, ..]

I want to write a function which takes any number of those list (could be all, could be only a and c for example) as its argument and selects the leftmost unique 10 elements equally from every list. For example, I will show in pictures with.

The initial data we have (length of 10 assumption).

We look at the first elements of every row and see a and e have same values. We randomly select let's say e, remove that element and shift it to the left and get this

Here we see that there is again overlap, 17 is appearing already and we shift e one more time

Again similar problem and we shift it one last time

Finally, we can select the first two elements of each list and there will be no duplicates

[1, 6, 17, 48, 46, 8, 82, 12, 21, 53]

It could be that more than one list could have identical values, same rules should apply.

I came with this which and for solving randomness I decided to shuffle the list before using it:

def prepare_unique_array(
    arrays: list = [], 
    max_length: int = 10, 
    slice_number: int = 2
):
    unique_array = []

    for array in arrays:
        for i in range(slice_number):
            while not len(unique_array) == max_length:
                if array[i] not in unique_array:
                    unique_array.append(array[i])
                else:
                    while array[i 1] in unique_array:
                        i  = 1
                    unique_array.append(array[i 1])

    return unique_array

Which gives the desired result given those initial values, but anything changes and it does not work.

maybe there is a numpy approach which does it faster and easier as well.

I will appreciate any guide/help

CodePudding user response：

Using cycle and iter to pick one element from each iterable, alternately:

from itertools import cycle

def uniques_evenly(n, *iterables):
    its = cycle(iter(seq) for seq in iterables)
    seen = set()
    it = next(its)
    for _ in range(n):
        x = next(it)
        while x in seen:
            x = next(it)  # pick next unique number
        seen.add(x)
        yield x
        it = next(its)    # switch to next iterator

Note that this will crash if one of the iterators is too short.

Testing:

a = [ 1,  6, 76, 15, 46, 55, 47, 15, 72, 58, 37756, 712, 666]
b = [17, 48, 22,  7, 35, 19, 91, 85, 49, 35, 42]
c = [46,  8, 53, 49, 28, 82, 30, 86, 57,  9]
d = [82, 12, 24, 60, 66, 17, 13, 69, 28, 99]
e = [ 1, 53, 17, 82, 21, 20, 88, 10, 82, 41, 216]

print( list(uniques_evenly(10, a,b,c,d,e)) )
# [1, 17, 46, 82, 53, 6, 48, 8, 12, 21]

CodePudding user response：

This may be a misinterpretation of what you're trying to achieve, it certainly gives a different result.

The way I read the question is that you iterate over your lists looking to see if an element in the list you're currently checking exists in any of the subsequent lists. If it does, you remove it and move on.

So, given 5 lists A, B, C, D & E we look for any value in A that occurs in any of B, C, D or E. If it's there, we remove it. Once A has been checked we move on to B and so forth.

Here's how I did it:

def pall(*args, mlen=10, slice=2):
    for i in range(len(args)-1):
        for n in args[i][:mlen]:
            for a in args[i 1:]:
                try:
                    a.pop(a.index(n))
                    break
                except ValueError:
                    pass
    rv = []
    for a in args:
        assert len(a) >= slice
        for s in range(slice):
            rv.append(a[s])
    return rv

a = [ 1,  6, 76, 15, 46, 55, 47, 15, 72, 58] 
b = [17, 48, 22,  7, 35, 19, 91, 85, 49, 35]
c = [46,  8, 53, 49, 28, 82, 30, 86, 57,  9]
d = [82, 12, 24, 60, 66, 17, 13, 69, 28, 99]
e = [ 1, 53, 17, 82, 21, 20, 88, 10, 82, 41]

print(pall(a,b,c,d,e))

The result of this is: [1, 6, 17, 48, 8, 53, 12, 24, 17, 82]