Consider the code below:
a = [1, 2]
a.extend(x*2 for x in a)
print(a)
At first glance, it would seem (to the uninitiated) that the code will generate the following output:
[1, 2, 2, 4]
But it does NOT generate that output. In fact, the program never ends - it is an infinite program. You need to convert that generator expression to a list first. Like so:
a = [1, 2]
a.extend([x*2 for x in a])
print(a)
Question is, why is it an infinite program when using the first approach? I understand that in order to execute extend
, it must first evaluate the generator expression, which in turn references the original list. But here's how I would assume it works (or should work):
extend
has a generator expression as the argument, so it goes ahead and evaluates the expression first- generator expression is evaluated to
[2, 4]
- now
extend
the original lista
with[2, 4]
But looks like that's not what actually happens.
Can anyone explain what is going on, and describe why it behaves the way it does?
CodePudding user response:
Short answer:
- generator expression is evaluated to [2, 4]
Is wrong. The generator expression is evaluated lazily, an item at a time, no such list
is produced, and the item are appended one at a time, in a way that means the generator expression gets its own output as it goes.
Long answer:
Generator expressions are lazy. list.extend
doesn't need to completely evaluate the argument, it can just iterate it as it goes, appending each item as it goes. The internal implementation can be written entirely in terms of append
, e.g.:
def extend(self, it):
for x in it:
self.append(x)
In practice, it's a C version of the loop with some minor optimizations tossed in to reduce reallocations when possible and some optimizations/protections that special-case list
, tuple
and direct self-extend
s.
This is typically a good thing; the whole point of generator expressions is to avoid a wasteful temporary list
, and forcibly converting from genexpr to list
would make it slower and add additional unnecessary memory expenses. If it just converted to list
then extended every time an operation like this occurred, genexprs would be mostly pointless. So they don't.
The code does include protection against the simple case of:
mylist.extend(mylist)
but it doesn't attempt to defend against cases like this where there's indirection in the way. There's no way to defend against it in general (it can't trace through all the layers of indirection possible to determine it's really self-assignment); when they come up, you're responsible for doing it safely.
CodePudding user response:
The first line is straightforward, a
is to be a list [1, 2]
.
a = [1, 2]
The second line causes the infinite loop:
a.extend(x*2 for x in a)
What is passed to a.extend()
is the generator expression x*2 for x in a
. This generator effectively does something like this:
for x in a:
yield x*2
However, what would happen if you instead ran?:
for x in a:
yield x*2
a.append(x*2)
This would in fact run forever, since there will always be a next x
to get from the list, which keeps growing.
And that's exactly what happens here, since .extend()
exhausts the generator one item at a time, adding the result to the list, before control is returned to the generator.
However, when you run:
a = [1, 2]
a.extend([x*2 for x in a])
The square brackets cause the generator to be exhausted entirely, into a list, which is then passed to .extend()
, which adds the whole thing to a
in one go, as it is a list and not some other iterable like a generator.