Now I am thinking of writing a code to get maximum depth of a binarytree. Python can't pass non-container parameter by reference, so there are generally two choices, use nonlocal
keyword or pass depth by copy.
The first one get_max_depth1
need more traceback operation, I wonder whether it costs less space compare to get_max_depth1
. If python implement nonlocal
use parameter pass by reference, then every function also bring an integer pointer, in this case, it is inferior to get_max_depth2
, because it's harder to write, slower to run, and save almost no space. If not, when the binary tree depth is 100000, get_max_depth1
only need one variable, get_max_depth2
need 100000 variable d
saved in their function, I guess it's meaningful to write d
outside.
def main():
root = BinaryTreeNode()
d = 0
maxd1 = 0
def get_max_depth1(node):
nonlocal d,maxd1
maxd1 = max(maxd1, d)
if node.left:
d = 1
get_max_depth1(node.left)
d -= 1
if node.right:
d = 1
get_max_depth1(node.right)
d -= 1
get_max_depth1(root)
maxd2 = 0
def get_max_depth2(node, d):
nonlocal maxd2
maxd2 = max(maxd2, d)
if node.left:
get_max_depth2(node.left, d 1)
if node.right:
get_max_depth2(node.right, d 1)
get_max_depth2(root, 0)
CodePudding user response:
If you want to find out exactly what the allocation difference is, then running the app with https://pypi.org/project/memory-profiler/ should give you the answers you're after. But that only applies to the very theoretical side and the result will be specific to one CPython version and may not hold overall.
In practice the answer is: they're about the same for a few reasons:
- Other code will dominate the performance (just creating new stack frames and would take more space)
- You can't go that deep anyway (default recursion limit is 1000 frames)
- Your data will not be that large (binary trees are usually kept balanced which at 100000 levels it would give you over
10^30102
elements) - Once you start caring about single bytes and limits like that, CPython stops being the right answer. Cython may be the simplest next step and there you can check the exact heap/stack usage of the resulting C program.
CodePudding user response:
Python's data model defines User-defined functions as having a __closure__
attribute, which reify the function closure: the readable/writeable values of enclosing scopes. __closure__
is None
or a tuple
of cells.
The standard currently (3.10) only defines that a cell has a cell_contents
attribute to represent the value of the cell. The CellType
makes no guarantees what it provides.
Notably, whether a cell is writeable is not determined by whether a function captures the closure as readable (bare usage) or readable/writeable (nonlocal
declaration). Both are the same kind of cell.
In practice, CPython¹ represents __closure__
as a regular tuple
and each cell as a 40 byte object that holds a pointer to its value.
>>> def outer(a = 3):
... def inner():
... print(a) # `a` is captured from outer scope
... return inner
>>> outer()
<function __main__.outer.<locals>.inner()>
>>> outer().__closure__
(<cell at 0x10eac2ca0: int object at 0x109f26d30>,)
>>> outer().__closure__[0].cell_contents
3
>>> # size of the closure tuple and cell in bytes
>>> sys.getsizeof(outer().__closure__), sys.getsizeof(outer().__closure__[0])
(48, 40)
The __closure__
itself belongs to the function object, whereas a cell
is shared between all functions closing over the same variable.
In contrast, a local variable is stored as an array of pointers – each 8 byte. The local storage belongs to the function call, so calling a function multiple times also creates multiple such pointers.
For reference, just the shell of the above inner
function object is 136 bytes. Its name and fully-qualified name are 54 and 69 bytes, respectively. Its bytecode is 45 bytes. There are many additional costs for things that you likely do not even know exist.
Keep that in mind when trying to safe individual chunks of 8 bytes.
¹CPython 3.8.12 [Clang 11.0.0 (clang-1100.0.33.17)], 64 bit build.