Home > front end >  A concern involving very large arrays
A concern involving very large arrays

Time:01-27

My concern involves huge arrays with shapes like (14!, 14), but I'll ask the question using a much smaller array.

Consider array p holding the 10! permutations of a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] We can create a permutation array of this shape (ie: 3628800, 10) in a variety of ways, say:

p = np.array(list(itertools.permutations(range(10))))

QUESTION: I'd like to know if there is any way I could produce, say:

array p1 holding the first 100000 permutations, then

array p2 holding the next 100000 permutations, then

etc..., then

array p37 holding the last 28800 permutations.

I'm not talking about creating the full set of permutations, then subdividing it. What I'd like to know is whether I can actually generate the permutation rows in 'clumps' of suitable size. The actual order of rows in each 'clump' isn't an issue, as long as the full set of 'clumps' holds all permutations without any overlap.

As mentioned earlier, my actual concern is to find a way, in principal, to handle much larger arrays of permutations. I'll worry about the size of the 'clumps', etc, later.

CodePudding user response:

Use itertools.islice in the batched recipe:

from itertools import islice, permutations

def batched(iterable, n):
    "Batch data into tuples of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while (batch := tuple(islice(it, n))):
        yield batch

perm = permutations(range(10))

arrays = [np.array(x) for x in batched(perm, 100000)]

If you want to iterate by chunk:

perm = permutations(range(10))

for x in batched(perm, 100000):
    a = np.array(x)
    print(a)
  • Related