Home > Mobile >  With the release of Java 18, UTF-8 is the default charset now. Does it imply that char data type is
With the release of Java 18, UTF-8 is the default charset now. Does it imply that char data type is

Time:07-14

I understand that Java follows UTF-16 for the char data type. But the new update in Java 18 - Default charset for the standard Java APIs is UTF-8. Does this update have any impact on the char data type encoding format? I also understand that UTF-8 is a variable-width encoding that can accommodate characters up to 4 bytes. What is the size of the char data type after Java 18? And does it still adhere to UTF-16 or moved to UTF-8?

CodePudding user response:

No, the change in default character encoding does not affect the internals of char/Character, nor does it affect the intervals of String.

I suggest you read the official documentation, JEP 400: UTF-8 by Default. The motivation and the details are explained thoroughly.

The change made in Java 18 affects mainly input/output. So this includes the older APIs and classes for reading and writing files. Some of the newer APIs and classes were already defaulting to UTF-8. JEP 400 seeks to make this default consistent throughout the bundled libraries.

One particular issue called out in the JEP regards Java source code files that were saved with a non-UTF-8 encoding and compiled with an earlier JDK. Recompiling on JDK 18 or later may cause problems.


By the way, let me remind you that char/Character has been legacy since Java 5, essentially broken since Java 2. As a 16-bit type, char is physically incapable of representing most characters.

To work with individual characters, use code point integer numbers. Look for codePoint methods added to classes including String, StringBuilder, Character, etc.

  • Related