Home > other >  When to avoid string interning
When to avoid string interning

Time:02-18

I've started looking into string interning and it seems like a great feature however I haven't found a great reason for why you would want to create a string using the string constructor, after some digging I came up with this, could someone confirm (or deny) if this is a valid reason to create a string with new?

Say you have 2 strings:

String novel = "The contents of a very long novel..."
String page = new String("The contents of a single page...")

By default all string literals are stored in the string pool (such as with String novel) and by default all sub-strings of novel will be interned (assuming they are created as a string literal) to optimizing memory allocation. Creating a string using the new keyword results in the string being created on the heap rather than in string pool. A particular case when you may want to avoid interning is if you wanted to create a string that is a sub-string of a very large string literal (such as page).

For example; Say you had a very large string literal (e.g. the contents of a novel) that you wanted to process only a portion of (e.g. a single page). It may be beneficial to use the string constructor (via new keyword) when creating the string that only contains a single page of the novel. That way the very large string may be free'd from the string pool sooner and keep only the string that contains the contents of a page on the heap. In contrast, if you created a string literal that is an interned sub-string of an entire novel, a larger amount of novel may be kept alive in the string pool despite only needing a small portion of the novel string.

CodePudding user response:

TL;DR: There is no good / valid reason to new a String in a modern JVM, or to call String.intern() explicitly.


Your question contains false statements of fact, and that means that the conclusions that you are drawing are incorrect.

By default all string literals are stored in the string pool (such as with String novel)

That is correct, though it is not "by default". (It is like saying "by default a square has 4 sides". Squares have 4 sides, period. There are no exceptions. And no defaults.)

and by default all sub-strings of novel will be interned (assuming they are created as a string literal) to optimizing memory allocation.

Incorrect.

A String created by the String.substring() method is NOT interned. Not in current Java releases, or (AFAIK) in any previous release. (But see below.)

Creating a string using the new keyword results in the string being created on the heap rather than in string pool.

Correct.

A particular case when you may want to avoid interning is if you wanted to create a string that is a sub-string of a very large string literal (such as page).

Incorrect.

I think you are confusing "interning" with something else.

Actually, in a modern JVM you always want to avoid interning. It is expensive, and it causes string objects to be (artificially) kept for longer than they need to me.

In fact, the only real reason that interning is still a thing is that it is necessary to guarantee certain semantic properties specified in the JLS about compile-time constant strings.

A modern JVM (Java 9 and later) performs string deduping in the garbage collector for strings that live long enough. This happens transparently ... and in cases where it is likely to be beneficial.


Historic note.

In some old JVMs, there used to be a good reason to call new String in conjunction with substring. The problem was the substring method has a "clever optimization" whereby it created the substrings to share the backing char[] with the original string1. This had the problem that references to (small) substrings could keep the (large) backing array reachable. It was a subtle kind of memory leak. You could avoid the leak by using new.

However:

  1. The optimization was NOT interning. The substrings were created in the regular heap, and they did not have the semantics of interned strings.
  2. The problem only affected certain String use-cases. And in practice they didn't involve large String literals.
  3. The problem was solved long ago. The String.substring now creates a new String with its own backing array.

In summary, using new String might have been a good idea in some cases with old Java versions, but it isn't anymore. It was fixed in Java 7.

1 - Interestingly, the source code for String describes this as a speed optimization rather than a space optimization.

  • Related