Python: Random numbers from multiple lists without repetition-CodePudding

I've been trying to create randoms lists of 15 numbers picking only a single one from each list available (15 lists) and without repeat any number.

The code as follows did that, but it is limited to only two different lists. I'd like to get rid of this limitation.


import random
n1 = list(range(1, 5))
n2 = list(range(2, 5))
n3 =  list(range(3,6))
n4 =  list(range(5,8))
n5 =  list(range(6,10))
n6 =  list(range(8,12))
n7 =  list(range(10,13))
n8 =  list(range(11,15))
n9 =  list(range(13,17))
n10 =  list(range(14,18))
n11 =  list(range(16,20))
n12 =  list(range(18,21))
n13 =  list(range(20,23))
n14 =  list(range(22,24))
n15 =  list(range(23,25))
for  i in range(10):
  lista = random.sample(list(zip(n1,n2,n3,n4,n5,n6,n7,n8,n9,n10,n11,n12,n13,n14,n15)),1)
  print(lista)

CodePudding user response：

When you do something like

zip([1,2,3,4],[5,6,7,8])

the resulting output is only the pairs

[(1, 5), (2, 6), (3, 7), (4, 8)]

so you're not getting stuff like (1, 6) or (2, 5) as possible options. If you really wanted to do something like this you should instead do a Cartesian product, like so:

itertools.product([1,2,3,4],[5,6,7,8])

This will give you every possible combination. For example:

>>> random.choice(list(itertools.product([1,2,3,4],[5,6,7,8])))
(1, 6)

However, if you actually try a 15-way Cartesian product with sets that have more than one or two elements in them, the resulting set it's going to construct is going to be enormous, and may not fit in memory.

Also, if the sets overlap, you'd have to go through and do some kind of filtering to discard options where the same number is picked more than once.

The easiest way to get a random list with no repetitions and each element chosen from a set would just be to pick element-by-element:

def pick_unique_elements_from_lists(*args):
  while True:
    result = []
    already_chosen = set()
    for arg in args:
      valid_choices = [ n for n in arg if n not in already_chosen ]

      if not valid_choices:
        continue
        
      choice = random.choice(valid_choices)

      result.append(choice)
      already_chosen.add(choice)

    return tuple(result)

However, while the choices here will be random, we might wonder if they'll be uniformly random. For instance, let's say that we want to pick a 4-tuple with the first element from [1,2], the second from [1,3], the third from [1,4], and the fourth from [1,5]. There are a limited number of ways to do this:

(1, 3, 4, 5)
(2, 1, 4, 5)
(2, 3, 1, 5)
(2, 3, 4, 1)
(2, 3, 4, 5)

So if we were sampling uniformly at random, we'd expect the first element of the tuple to be a 2 about 80% of the time. This is not, however, what the pick_unique_elements_from_lists function does; if you try it, you'll find that it gives you the tuple that starts with 1 about 50% of the time.

There's another drawback to the pick_unique_elements_from_lists function, which is that if you give it a sequence of arguments from which it's impossible to pick any tuple of distinct elements, e.g.

[1, 2], [2, 3], [1, 3]

then it will just spin forever trying to come up with a valid sample.

If you need uniform sampling, I can see three approaches:

Actually enumerate every single possible tuple you can get, and then select one of those at random.
Do accept/reject sampling of random sequences of the complete universe of numbers. This runs in bounded space, but could take an extremely long time.
Come up with a clever bijective enumeration of valid samples and use it to construct a clean sampling algorithm. I have no idea how hard this would be, although I warn you that problems that sound like this can often be extremely hard.

Here's an example of how you could do the accept/reject approach:

 def accept_reject_from_lists(*args):
   universe = set().union(*args)
   found = False
   while not found:
     candidate = random.sample(universe, len(args))
     found = True
     for i in range(len(candidate)):
       if candidate[i] not in args[i]:
         found = False
         break
   return tuple(candidate)

This still has the downside that it will go into an infinite loop if there aren't any tuples satisfying your conditions, and it also might take extremely long depending on how much overlap there is between your lists, but on the plus side it won't run out of memory and crash if you give it a huge problem to solve.