I've defined two simple Python functions that take a single argument, raise an exception, and handle the raised exception. One function uses a variable to refer to the exception before raising/handling, the other does not:
def refcount_unchanged(x):
try:
raise Exception()
except:
pass
def refcount_increases(x):
e = Exception()
try:
raise e
except:
pass
One of the resulting functions increases pythons refcount
for its input argument, the other does not:
import sys
a = []
print(sys.getrefcount(a))
for i in range(3):
refcount_unchanged(a)
print(sys.getrefcount(a))
# prints: 2, 2, 2, 2
b = []
print(sys.getrefcount(b))
for i in range(3):
refcount_increases(b)
print(sys.getrefcount(b))
# prints: 2, 3, 4, 5
Can anyone explain why this happens?
CodePudding user response:
It is a side effect of the "exception -> traceback -> stack frame -> exception" reference cycle from the __traceback__
attribute on exception instances introduced in PEP-344 (Python 2.5), and resolved in cases like refcount_unchanged
in PEP-3110 (Python 3.0).
In refcount_increases
, the reference cycle can be observed by printing this:
except:
print(e.__traceback__.tb_frame.f_locals) # {'x': [], 'e': Exception()}
which shows that x
is also referenced in the frame's locals.
The reference cycle is resolved when the garbage collector runs, or if gc.collect()
is called.
In refcount_unchanged
, as per PEP-3110's Semantic Changes, Python 3 generates additional bytecode to delete the target, thus eliminating the reference cycle:
def refcount_unchanged(x):
try:
raise Exception()
except:
pass
gets translated to something like:
def refcount_unchanged(x):
try:
raise Exception()
except Exception as e:
try:
pass
finally:
e = None
del e
Resolving the reference cycle in refcount_increases
While not necessary (since the garbage collector will do its job), you can do something similar in refcount_increases
by manually deleting the variable reference:
def refcount_increases(x):
e = Exception()
try:
raise e
except:
pass
finally: #
del e #
Alternatively, you can overwrite the variable reference and let the implicit deletion work:
def refcount_increases(x):
e = Exception()
try:
raise e
# except: # -
except Exception as e: #
pass
A little more about the reference cycle
The exception e
and other local variables are actually referenced directly by e.__traceback__.tb_frame
, presumably in C code.
This can be observed by printing this:
print(sys.getrefcount(b))
print(gc.get_referrers(b)[0]) # <frame at ...>
Accessing e.__traceback__.tb_frame.f_locals
creates a dictionary cached on the frame (another reference cycle) and thwarts the proactive resolutions above.
print(sys.getrefcount(b))
print(gc.get_referrers(b)[0]) # {'x': [], 'e': Exception()}
However, this reference cycle will also be handled by the garbage collector.
CodePudding user response:
It seems that writing out the question helped us realize part of the answer. If we garbage-collect after each call to refcount_increases
, the refcount no longer increases. Interesting! I don't think this is a complete answer to our question, but it's certainly suggestive. Any further information would be welcome.
import gc
c = []
print(sys.getrefcount(c))
for i in range(3):
refcount_increases(c)
gc.collect()
print(sys.getrefcount(c))
# prints: 2, 2, 2, 2