If Python strings are immutable, why does it keep the same id if I use = to append to it?-CodePudding

Strings in Python are immutable, which means the value cannot be changed. However, when appending to the string in the following example, it looks like the original string memory is modified since the id remains the same:

>>> s = 'String'
>>> for i in range(5, 0, -1):
...     s  = str(i)
...     print(f"{s:<11} stored at {id(s)}")
... 
String5     stored at 139841228476848
String54    stored at 139841228476848
String543   stored at 139841228476848
String5432  stored at 139841228476848
String54321 stored at 139841228476848

Conversely, in the following example, the id changes:

>>> a = "hello"
>>> id(a)
139841228475760
>>> a = "b"   a[1:]
>>> print(a)
bello
>>> id(a)
139841228475312

CodePudding user response：

It's a CPython-specific optimization for the case when the str being appended to happens to have no other living references. The interpreter "cheats" in this case, allowing it to modify the existing string by reallocating (which can be in place, depending on heap layout) and appending the data directly, and often reducing the work significantly in loops that repeatedly concatenate (making it behave more like the amortized O(1) appends of a list rather than O(n) copy operations each time). It has no visible effect besides the unchanged id, so it's legal to do this (no one with an existing reference to a str ever sees it change unless the str was logically being replaced).

You're not actually supposed to rely on it (non-reference counted interpreters can't use this trick, since they can't know if the str has other references), per PEP8's very first programming recommendation:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a = b or a = a b. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

If you want to break the optimization, there are all sorts of ways to do so, e.g. changing your code to:

>>> while i!=0:
...     s_alias = s  # Gonna save off an alias here
...     s  = str(i)
...     print(s   " stored at "   str(id(s)))
...     i -= 1
...

breaks it by creating an alias, increasing the reference count and telling Python that the change would be visible somewhere other than s, so it can't apply it. Similarly, code like:

s = s   a   b

can't use it, because s a occurs first, and produces a temporary that b must then be added to, rather than immediately replacing s, and the optimization is too brittle to try to handle that. Almost identical code like:

s  = a   b

or:

s = s   (a   b)

restores the optimization by ensuring the final concatenation is always one where s is the left operand and the result is used to immediately replace s.

CodePudding user response：

Regardless of implementation details, the docs say:

… Two objects with non-overlapping lifetimes may have the same id() value.

The previous object referenced by s no longer exists after the = so the new object breaks no rules by having the same id.

CodePudding user response：

When you change a string using =, you are redeclaring its value. For example, the code s = "H" s = "I" is saying the redeclared value of s is equal to the old value "I" which means redeclared s is "H" "I", which is "HI".