Home > Software engineering >  How are Strings (and in general other types/objects) stored and organized in memory in Java?
How are Strings (and in general other types/objects) stored and organized in memory in Java?

Time:10-02

I was wondering how Java stores String objects in memory, let's take this code as example:

String s1 = "Hello";
String s2 = new String("Hello");

If I understood correctly, s1 references to "Hello", instead s2 references to a new portion of memory containing "Hello", so if we imagine the memory divided in constants, statics, heap and stack, we would have s1 and s2 in stack, "Hello" in constants and the new created object containing "Hello" in the heap. Am I right?

If we analyze only this piece of code:

String s = new String("Hello");

What I expect it to do when we start the program, is to put the "Hello" string in constants, then it makes a copy of this constant string and puts it in the heap and then gives the address of this heap piece to s contained in stack. Is this correct? Or maybe it puts in the heap not a copy of the string. but the actual address? But doesn't this mean that if we go back to the code with s1 and s2, s1 and s2, in a certain way, reference to the same memory address with the only difference that s1->"Hello" and s2->heap->"Hello"? I know that I shouldn't be asking this stuff if we're talking about Java since the JVM makes this stuff and I shouldn't care about it but it's for my personal knowledge.

CodePudding user response:

There is a simple formal definition, given by JVMS, §2.5.3

The heap is the run-time data area from which memory for all class instances and arrays is allocated.

So the string objects are stored in the heap memory, by definition.

Everything else is an implementation detail and may differ between JVM implementations and even between versions.

Typically, a String object is distinct from the character contents it encapsulates. The characters are typically stored in arrays. So when you create a new object via new String("Hello"), the two String instances still can share the array behind the scenes*. Some implementations may even search for equal strings and modify them to share the array if they didn’t yet, to save memory, see String Deduplication feature of Java 8

Even for constant strings, the array usually is an ordinary heap object distinct from the constant data. Characters in a class file are stored in the “Modified UTF-8” format which is not suitable for the String API which provides random access based on UTF-16 units.

So the first time, a String object for the "Hello" constant is created, the character data will be converted into an array. In case of recent OpenJDK based implementations, it will be a byte[] array using either ISO-LATIN-1 or UTF-16 for the content, which is then encapsulated by the String instance. Prior to JDK 9, it was a char[] array.

Subsequent accesses to the same constant will use the constant data to look up the already existing object. Usually, all code locations get permanently linked to the object after the first execution, so they don’t have to look it up again on subsequent executions.

But when you use Class Data Sharing, the archive may already contain prebuilt string objects which do not need to be converted from class file data. As said, it’s all implementation specific.

* In the reference implementation, the String(String) constructor’s behavior changed with Java 7, update 6. Since then, it will share the array.

  • Related