I have a number of characters n. I want to sample(randomly) from all possible permutations of these n characters, say m samples.
what i did so far works up to n=10 chars, but i need it to work for n=40 chars:
n=10 #nr of characters to be permuted. currently works for 10 max
seq_to_permute = np.arange(1,n 1) #creating the "chars" to be permuted
rand_list = list(permutations(seq_to_permute)) #storing all possible permutations in list->throws MemoryError
# need to sample many times from the rand_list
perm_samples = []
for i in range (1000):
perm_samples.append(rand_list[random.randint(0, len(rand_list)-1)])
When i try the above code with n=40 then i get memory error, as I am storing all possible permutations in a list to sample from it. From reading around I understood that it is not possible to not have memory errors when storing the list of all permutations. So according to this answer (Prevent memory error in itertools.permutation) i created an iterator from permutation module, which as I understand doesnt store all possible permutations in memory. Then tried to sample from there but I am still getting the memory error:
seq_to_permute = np.arange(1,n 1) #creating the "chars" to be permuted
print(seq_to_permute)
# rand_list = list(permutations(seq_to_permute)) throws memory error when u store them all
perm_iterator = itertools.permutations(seq_to_permute)
# need to sample many times from the rand_list
perm_samples = []
perm_samples = (random.sample(list(perm_iterator), 5))
any help on how to randomly sample some permutations of characters without storing all possible permutations would be greatly appreciated
CodePudding user response:
you can predetermine the samples you want to take if you know their total number, from Get the total number of permutations in python
then construct the list of samples that you want to keep.
from math import factorial
import random
import numpy as np
import itertools
def nPr(n, r):
return int(factorial(n)/factorial(n-r))
n = 10
seq_to_permute = np.arange(1,n 1) #creating the "chars" to be permuted
print(seq_to_permute)
perm_iterator = itertools.permutations(seq_to_permute)
seq_to_permute_length = nPr(n,n) # precompute the length using maths
print(f"searching {seq_to_permute_length} samples")
# precompute the indicies to take
indicies = []
for i in range(1000):
indicies.append(random.randint(0, seq_to_permute_length-1))
indicies = set(indicies) # for some performance gain
# take the needed indicies
picked_samples = []
for i, number in enumerate(perm_iterator):
if i in indicies:
picked_samples.append(number)
print(picked_samples)
# # faster path https://stackoverflow.com/a/54027494/15649230
# picked_samples = []
# for idx in indicies:
# elements = []
# for j in range(len(seq_to_permute)):
# elements.append(elt(len(seq_to_permute),idx,j))
# picked_samples.append(seq_to_permute[elements])
since the number of permutations grows as the factorial of n
you need until the end of eternity to calculate the permutations on 40 elements, so instead we know that we are randomly and uniformly sampling permutations of 40 elements, so we can just randomly permute the array and be done with it.
import numpy as np
n = 40
seq_to_permute = np.arange(1,n 1) #creating the "chars" to be permuted
print(seq_to_permute)
results = [] # store result in set() to avoid duplicates
while len(results) < 1000:
results.append(tuple(seq_to_permute[np.random.permutation(n)]))
print(results)
another way would be to calculate the i'th permutation of an array, by dividing i into n binary numbers and computing the i'th permutation directly, but that's just a longer way to the same result, and would be similar to this answer How to get the ith element of the nth permutation of a sequence directly (without any recursivity)?