Different memory size of dict() and {} in Python-CodePudding

There is this problem I cannot get my head around to solve. So in order to create a dictionary, we have two ways: (1) a function call dict() and (2) literal syntax {}.

When I check the memory size of creation of two empty dictionaries, sys.getsizeof() return different memory size:

import sys

my_dict1 = {}
sys.getsizeof(my_dict1)
# 64

For syntax {}, by running this example you can see that the basic occupation of a Python dictionary is 64 bytes (8 buckets, each is 8 bytes, so 8 x 8 = 64). That makes sense to me. Since the {} syntax is handled in byte-code it can use this optimization mentioned here.

and

import sys

my_dict2 = dict()
sys.getsizeof(my_dict2)
# 232

But for dict(), it is 232 bytes. I think this huge difference is due to the different implementations. dict() is handled like a regular class constructor and Python uses the generic memory allocator, which does not follow an easily predictable pattern like the free list (again, this answer is great).

However, what I do not understand starts here, even if I start my empty dictionary using {} and begin to add elements to this empty dictionary, memory size jumps to 232 from 64 in the first iteration.

import sys

my_dict = {}
for i in range(20):
    my_dict[i] = 100
    print(f"elements = {i 1} size = {sys.getsizeof(my_dict)}")
    
# elements = 1 size = 232
# elements = 2 size = 232
# elements = 3 size = 232
# elements = 4 size = 232
# elements = 5 size = 232
# elements = 6 size = 360
# elements = 7 size = 360
# elements = 8 size = 360
# elements = 9 size = 360
# elements = 10 size = 360
# elements = 11 size = 640
# elements = 12 size = 640
# elements = 13 size = 640
# elements = 14 size = 640
# elements = 15 size = 640
# elements = 16 size = 640
# elements = 17 size = 640
# elements = 18 size = 640
# elements = 19 size = 640
# elements = 20 size = 640

In order to make Python hash table fast and reduce collisions, the interpreter keeps resizing the dictionary when it becomes full for two-third. That I know.

What I do not understand, why in the VERY FIRST insertion, the memory size goes from 64 to 232, even though empty dictionary started by {} is 64 bytes. Because when I create an empty dictionary using {}, I use special opcodes for the containers, instead of performing function calls.

EDIT: Python version I am using is 3.8.12.

CodePudding user response：

On Python 3.8.12, the version you're on, dict() allocates a new "keys object" (the part of the dict that stores the hash table, the keys, and in non-"split" dicts, the values), while {} uses a common "keys object" shared by many empty dicts. The shared keys object isn't counted towards the size of the {} dict.

Note that as soon as you add entries to the {} dict, it will need to allocate a non-shared keys object for its own use, so using {} won't actually save any memory in most cases.

You can see the code for dict() by looking at dict_new and dict_init, the C functions corresponding to dict.__new__ and dict.__init__.

You can see the code for {} in the bytecode evaluation loop, under the BUILD_MAP opcode, and follow that down to _PyDict_NewPresized.

You can see the code that sys.getsizeof delegates to for a dict in dict_sizeof, the C implementation of dict.__sizeof__. The code that ignores the size of the shared keys object for {} is if (mp->ma_keys->dk_refcnt == 1), in _PyDict_SizeOf.