There is this problem I cannot get my head around to solve. So in order to create a dictionary, we have two ways: (1) a function call dict()
and (2) literal syntax {}
.
When I check the memory size of creation of two empty dictionaries, sys.getsizeof()
return different memory size:
import sys
my_dict1 = {}
sys.getsizeof(my_dict1)
# 64
For syntax {}
, by running this example you can see that the basic occupation of a Python dictionary is 64 bytes (8 buckets, each is 8 bytes, so 8 x 8 = 64). That makes sense to me. Since the {}
syntax is handled in byte-code it can use this optimization mentioned here.
and
import sys
my_dict2 = dict()
sys.getsizeof(my_dict2)
# 232
But for dict()
, it is 232 bytes. I think this huge difference is due to the different implementations. dict()
is handled like a regular class constructor and Python uses the generic memory allocator, which does not follow an easily predictable pattern like the free list (again, this answer is great).
However, what I do not understand starts here, even if I start my empty dictionary using {}
and begin to add elements to this empty dictionary, memory size jumps to 232 from 64 in the first iteration.
import sys
my_dict = {}
for i in range(20):
my_dict[i] = 100
print(f"elements = {i 1} size = {sys.getsizeof(my_dict)}")
# elements = 1 size = 232
# elements = 2 size = 232
# elements = 3 size = 232
# elements = 4 size = 232
# elements = 5 size = 232
# elements = 6 size = 360
# elements = 7 size = 360
# elements = 8 size = 360
# elements = 9 size = 360
# elements = 10 size = 360
# elements = 11 size = 640
# elements = 12 size = 640
# elements = 13 size = 640
# elements = 14 size = 640
# elements = 15 size = 640
# elements = 16 size = 640
# elements = 17 size = 640
# elements = 18 size = 640
# elements = 19 size = 640
# elements = 20 size = 640
In order to make Python hash table fast and reduce collisions, the interpreter keeps resizing the dictionary when it becomes full for two-third. That I know.
What I do not understand, why in the VERY FIRST insertion, the memory size goes from 64 to 232, even though empty dictionary started by {}
is 64 bytes. Because when I create an empty dictionary using {}
, I use special opcodes for the containers, instead of performing function calls.
EDIT: Python version I am using is 3.8.12.
CodePudding user response:
On Python 3.8.12, the version you're on, dict()
allocates a new "keys object" (the part of the dict that stores the hash table, the keys, and in non-"split" dicts, the values), while {}
uses a common "keys object" shared by many empty dicts. The shared keys object isn't counted towards the size of the {}
dict.
Note that as soon as you add entries to the {}
dict, it will need to allocate a non-shared keys object for its own use, so using {}
won't actually save any memory in most cases.
You can see the code for dict()
by looking at dict_new
and dict_init
, the C functions corresponding to dict.__new__
and dict.__init__
.
You can see the code for {}
in the bytecode evaluation loop, under the BUILD_MAP
opcode, and follow that down to _PyDict_NewPresized
.
You can see the code that sys.getsizeof
delegates to for a dict in dict_sizeof
, the C implementation of dict.__sizeof__
. The code that ignores the size of the shared keys object for {}
is if (mp->ma_keys->dk_refcnt == 1)
, in _PyDict_SizeOf
.