Let's say I have a list [1,2,3,4,5,6]
, and I want to iterate over all the subgroups of len 2 [1,2] [3,4] [5,6]
.
The naive way of doing it
L = [1,2,3,4,5,6]
N = len(L)//2
for k in range(N):
slice = L[k*2:(k 1)*2]
for val in slice:
#Do things with the slice
However I was wondering if there is a more pythonic method to iterate over a "partitioned" list already. I also accept solutions with numpy arrays
. Something like:
L = [1,2,3,4,5,6]
slices = f(L,2) # A nice "f" here?
for slice in slices:
for val in slice:
#Do things with the slice
Thanks a lot!
CodePudding user response:
Use the grouper
recipe from the itertools
library:
import itertools
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
L = [1,2,3,4,5,6]
for slice in grouper(L, 2):
print(slice)
CodePudding user response:
To have a nice f
as you are asking (not commenting on whether it is really a good idea, depending on what you are really trying to do) I would go with itertools
itertools.islice(itertools.pairwise(L), 0, None, 2)
is your f
. Note that L is a list here. But it could be any iterator. Which is the point with itertools. You could have billions of iteration in L, and therefore billions of iterations with my generator, without using any memory. As long as L is not in memory, and that what you are doing with the slice is not stacking them in memory (if you do, then the method is just the same as any other).
Usage example
import itertools
L=[1,2,3,4,5,6]
for p in itertools.islice(itertools.pairwise(L), 0, None, 2):
print(p)
(1, 2)
(3, 4)
(5, 6)
Explanation
itertools.pairwise
iterates by pairs. So almost what you are looking for.
Except that those are 'overlapping'.
In your case, it iterates (1,2), (2,3), (3,4), (4,5), (5,6)
itertools.islice(it, 0, None, 2)
iterates every two elements.
So both together, your get the 1st, 3rd, 5th, .. pairs of previous iterator, that is what you want
Timings
Doing nothing, with 1000 elements
method | Timing |
---|---|
Yours | 94 ms |
Variant | 52 ms |
numpy | 187 ms |
itertools | 48 ms |
Note: what I call "variant" is almost the same as your method (not the same timings tho!), avoiding the k*2
for k in range(0,len(L),2):
slice = L[k:k 2]
for val in slice:
....
The fact that it is so fast (almost as fast as mine) says a lot about how negligible all this is. All I did is avoid 2 multiplication, and it almost halves the timing.
Note 2: numpy is inefficient in this example, precisely because we do nothing in this question but iterating. So building of the array is what costs. But depending on what you want to do, numpy can be way faster than any other method, if you can avoid any iteration.
For example (just using a random one), if what you want to do is computing the sum for every pairs (a,b)
of L
of a 2b
, numpy
's a[:,0].sum() a[:,1].sum()*2
would beats any iteration based method, even with itertools.
But, well, from what we know of your problem (that is that you want to iterate), my itertools
method is so far the fastest. And since it is a one-liner, I guess it is also the most pythonesque.