I went through the post https://rushter.com/blog/python-strings-and-memory/
Based on that article,
- Depending on the type of characters in a string, each character in that string would be represented using either 1/2/4 bytes
- Since the address length of each such character is fixed (either 1/2/4), we can find the address of index i using starting_pos_address no_of_bytes*index
But the below code kinda contradicts this model of string being stored as a contiguous block of characters, but more like an array of references/pointers to individual characters/strings since o
in both the strings point to the same object
>>> s1 = "hello"
>>> s2 = "world"
>>> id(s1[4])
140195535215024
>>> id(s2[1])
140195535215024
So, should I see string as an array of characters or array of references to character objects?
CodePudding user response:
The key piece of information can be read in this answer to a similiar question - "Indexing into a string creates a new string" - which means, both s1[4]
and s2[1]
create new string, "o"
. Because strings are interned, Python optimalizes the reference to point to the same object in memory, which is not necessarily the character than was part of any of the original string.
So yes, strings are stored as arrays of characters