Community of Stackoverflow:
I'm trying to create a list of sublists with a loop based on a random sampling of values of another list; and each sublist has the restriction of not having a duplicate or a value that has already been added to a prior sublist.
Let's say (example) I have a main list:
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
#I get:
[[1,13],[4,1],[8,13]]
#I WANT:
[[1,13],[4,9],[8,14]] #(no duplicates when checking previous sublists)
The real code that I thought it would work is the following (as a draft):
matrixvals=list(matrix.index.values) #list where values are obtained
lists=[[]for e in range(0,3)] #list of sublists that I want to feed
vls=[] #stores the values that have been added to prevent adding them again
for e in lists: #initiate main loop
for i in range(0,5): #each sublist will contain 5 different random samples
x=random.sample(matrixvals,1) #it doesn't matter if the samples are 1 or 2
if any(x) not in vls: #if the sample isn't in the evaluation list
vls.extend(x)
e.append(x)
else: #if it IS, then do a sample but without those already added values (line below)
x=random.sample([matrixvals[:].remove(x) for x in vls],1)
vls.extend(x)
e.append(x)
print(lists)
print(vls)
It didn't work as I get the following:
[[[25], [16], [15], [31], [17]], [[4], [2], [13], [42], [13]], [[11], [7], [13], [17], [25]]]
[25, 16, 15, 31, 17, 4, 2, 13, 42, 13, 11, 7, 13, 17, 25]
As you can see, number 13 is repeated 3 times, and I don't understand why
I would want:
[[[25], [16], [15], [31], [17]], [[4], [2], [13], [42], [70]], [[11], [7], [100], [18], [27]]]
[25, 16, 15, 31, 17, 4, 2, 13, 42, 70, 11, 7, 100, 18, 27] #no dups
In addition, is there a way to convert the sample.random results as values instead of lists? (to obtain):
[[25,16,15,31,17]], [4, 2, 13, 42,70], [11, 7, 100, 18, 27]]
Also, the final result in reality isn't a list of sublists, actually is a dictionary (the code above is a draft attempt to solve the dict problem), is there a way to obtain that previous method in a dict? With my present code I got the next results:
{'1stkey': {'1stsubkey': {'list1': [41,
40,
22,
28,
26,
14,
41,
15,
40,
33],
'list2': [41, 40, 22, 28, 26, 14, 41, 15, 40, 33],
'list3': [41, 40, 22, 28, 26, 14, 41, 15, 40, 33]},
'2ndsubkey': {'list1': [21,
7,
31,
12,
8,
22,
27,...}
Instead of that result, I would want the following:
{'1stkey': {'1stsubkey': {'list1': [41,40,22],
'list2': [28, 26, 14],
'list3': [41, 15, 40, 33]},
'2ndsubkey': {'list1': [21,7,31],
'list2':[12,8,22],
'list3':[27...,...}#and so on
Is there a way to solve both list and dict problem? Any help will be very appreciated; I can made some progress even only with the list problem
Thanks to all
CodePudding user response:
I realize you may be more interested in finding out why your particular approach isn't working. However, if I've understood your desired behavior, I may be able to offer an alternative solution. After posting my answer, I will take a look at your attempt.
random.sample
lets you sample k
number of items from a population
(collection, list, whatever.) If there are no repeated elements in the collection, then you're guaranteed to have no repeats in your random sample:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_samples = 4
print(sample(pool, k=num_samples))
Possible output:
[9, 11, 8, 7]
>>>
It doesn't matter how many times you run this snippet, you will never have repeated elements in your random sample. This is because random.sample
doesn't generate random objects, it just randomly picks items which already exist in a collection. This is the same approach you would take when drawing random cards from a deck of cards, or drawing lottery numbers, for example.
In your case, pool
is the pool of possible unique numbers to choose your sample from. Your desired output seems to be a list of three lists, where each sublist has two samples in it. Rather than calling random.sample
three times, once for each sublist, we should call it once with k=num_sublists * num_samples_per_sublist
:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_sublists = 3
samples_per_sublist = 2
num_samples = num_sublists * samples_per_sublist
assert num_samples <= len(pool)
print(sample(pool, k=num_samples))
Possible output:
[14, 10, 1, 8, 6, 3]
>>>
OK, so we have six samples rather than four. No sublists yet. Now you can simply chop this list of six samples up into three sublists of two samples each:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_sublists = 3
samples_per_sublist = 2
num_samples = num_sublists * samples_per_sublist
assert num_samples <= len(pool)
def pairwise(iterable):
yield from zip(*[iter(iterable)]*samples_per_sublist)
print(list(pairwise(sample(pool, num_samples))))
Possible output:
[(4, 11), (12, 13), (8, 15)]
>>>
Or if you really want sublists, rather than tuples:
def pairwise(iterable):
yield from map(list, zip(*[iter(iterable)]*samples_per_sublist))
EDIT - just realized that you don't actually want a list of lists, but a dictionary. Something more like this? Sorry I'm obsessed with generators, and this isn't really easy to read:
keys = ["1stkey"]
subkeys = ["1stsubkey", "2ndsubkey"]
num_lists_per_subkey = 3
num_samples_per_list = 5
num_samples = num_lists_per_subkey * num_samples_per_list
min_sample = 1
max_sample = 50
pool = list(range(min_sample, max_sample 1))
def generate_items():
def generate_sub_items():
from random import sample
samples = sample(pool, k=num_samples)
def generate_sub_sub_items():
def chunkwise(iterable, n=num_samples_per_list):
yield from map(list, zip(*[iter(iterable)]*n))
for list_num, chunk in enumerate(chunkwise(samples), start=1):
key = f"list{list_num}"
yield key, chunk
for subkey in subkeys:
yield subkey, dict(generate_sub_sub_items())
for key in keys:
yield key, dict(generate_sub_items())
print(dict(generate_items()))
Possible output:
{'1stkey': {'1stsubkey': {'list1': [43, 20, 4, 27, 2], 'list2': [49, 44, 18, 8, 37], 'list3': [19, 40, 9, 17, 6]}, '2ndsubkey': {'list1': [43, 20, 4, 27, 2], 'list2': [49, 44, 18, 8, 37], 'list3': [19, 40, 9, 17, 6]}}}
>>>