Home > Software design >  python strings are immutable. Why did this happen then?
python strings are immutable. Why did this happen then?

Time:06-10

Strings in Python are immutable, which means the value cannot be changed. I was testing out the scenario, but it looks like the original string is modified. I'm just trying to understand the concept

>>> s = 'String'
>>> i = 5
>>> while i != 0:
...     s  = str(i)
...     print(s   " stored at "   str(id(s)))
...     i -= 1
... 
String5 stored at 139841228476848
String54 stored at 139841228476848
String543 stored at 139841228476848
String5432 stored at 139841228476848
String54321 stored at 139841228476848
>>> a = "hello"
>>> id(a)
139841228475760
>>> a = "b"   a[1:]
>>> print(a)
bello
>>> id(a)
139841228475312

CodePudding user response:

It's a CPython-specific optimization for the case when the str being appended to happens to have no other living references. The interpreter "cheats" in this case, allowing it to modify the existing string by reallocating (which can be in place, depending on heap layout) and appending the data directly, and often reducing the work significantly in loops that repeatedly concatenate (making it behave more like the amortized O(1) appends of a list rather than O(n) copy operations each time). It has no visible effect besides the unchanged id, so it's legal to do this (no one with an existing reference to a str ever sees it change unless the str was logically being replaced).

You're not actually supposed to rely on it (non-reference counted interpreters can't use this trick, since they can't know if the str has other references), per PEP8's very first programming recommendation:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a = b or a = a b. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

If you want to break the optimization, there are all sorts of ways to do so, e.g. changing your code to:

>>> while i!=0:
...     s  = str(i)
...     s2 = s  # Gonna save off an alias here
...     print(s   " stored at "   str(id(s)))
...     i -= 1
... 

breaks it by creating an alias, increasing the reference count and telling Python that the change would be visible somewhere other than s, so it can't apply it. Similarly, code like:

s = s   a   b

can't use it, because s a occurs first, and produces a temporary that b must then be added to, rather than immediately replacing s, and the optimization is too brittle to try to handle that. Almost identical code like:

s  = a   b

or:

s = s   (a   b)

restores the optimization by ensuring the final concatenation is always one where s is the left operand and the result is used to immediately replace s.

CodePudding user response:

Regardless of implementation details, the docs say:

… Two objects with non-overlapping lifetimes may have the same id() value.

The previous object referenced by s no longer exists after the = so the new object breaks no rules by having the same id.

  • Related