Iterate over all the sub-groups of a list-CodePudding

Let's say I have a list [1,2,3,4,5,6], and I want to iterate over all the subgroups of len 2 [1,2] [3,4] [5,6].

The naive way of doing it

    L = [1,2,3,4,5,6]
    N = len(L)//2
    for k in range(N):
        slice = L[k*2:(k 1)*2]
        for val in slice:
            #Do things with the slice

However I was wondering if there is a more pythonic method to iterate over a "partitioned" list already. I also accept solutions with numpy arrays. Something like:

    L = [1,2,3,4,5,6]
    slices = f(L,2) # A nice "f" here? 
    for slice in slices:
        for val in slice:
            #Do things with the slice

Thanks a lot!

CodePudding user response：

Use the grouper recipe from the itertools library:

import itertools

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)

L = [1,2,3,4,5,6]
for slice in grouper(L, 2):
    print(slice)

CodePudding user response：

To have a nice f as you are asking (not commenting on whether it is really a good idea, depending on what you are really trying to do) I would go with itertools

itertools.islice(itertools.pairwise(L), 0, None, 2)

is your f. Note that L is a list here. But it could be any iterator. Which is the point with itertools. You could have billions of iteration in L, and therefore billions of iterations with my generator, without using any memory. As long as L is not in memory, and that what you are doing with the slice is not stacking them in memory (if you do, then the method is just the same as any other).

Usage example

import itertools
L=[1,2,3,4,5,6]
for p in itertools.islice(itertools.pairwise(L), 0, None, 2):
   print(p)

(1, 2)
(3, 4)
(5, 6)

Explanation

itertools.pairwise iterates by pairs. So almost what you are looking for. Except that those are 'overlapping'.

In your case, it iterates (1,2), (2,3), (3,4), (4,5), (5,6)

itertools.islice(it, 0, None, 2) iterates every two elements.

So both together, your get the 1st, 3rd, 5th, .. pairs of previous iterator, that is what you want

Timings

Doing nothing, with 1000 elements

method	Timing
Yours	94 ms
Variant	52 ms
numpy	187 ms
itertools	48 ms

Note: what I call "variant" is almost the same as your method (not the same timings tho!), avoiding the k*2

for k in range(0,len(L),2):
    slice = L[k:k 2]
    for val in slice:
        ....

The fact that it is so fast (almost as fast as mine) says a lot about how negligible all this is. All I did is avoid 2 multiplication, and it almost halves the timing.

Note 2: numpy is inefficient in this example, precisely because we do nothing in this question but iterating. So building of the array is what costs. But depending on what you want to do, numpy can be way faster than any other method, if you can avoid any iteration.

For example (just using a random one), if what you want to do is computing the sum for every pairs (a,b) of L of a 2b, numpy's a[:,0].sum() a[:,1].sum()*2 would beats any iteration based method, even with itertools.

But, well, from what we know of your problem (that is that you want to iterate), my itertools method is so far the fastest. And since it is a one-liner, I guess it is also the most pythonesque.