I understand that Java follows UTF-16 for the char data type. But the new update in Java 18 - Default charset for the standard Java APIs is UTF-8. Does this update have any impact on the char data type encoding format? I also understand that UTF-8 is a variable-width encoding that can accommodate characters up to 4 bytes. What is the size of the char data type after Java 18? And does it still adhere to UTF-16 or moved to UTF-8?
CodePudding user response:
No, the change in default character encoding does not affect the internals of char
/Character
, nor does it affect the intervals of String
.
I suggest you read the official documentation, JEP 400: UTF-8 by Default. The motivation and the details are explained thoroughly.
The change made in Java 18 affects mainly input/output. So this includes the older APIs and classes for reading and writing files. Some of the newer APIs and classes were already defaulting to UTF-8. JEP 400 seeks to make this default consistent throughout the bundled libraries.
One particular issue called out in the JEP regards Java source code files that were saved with a non-UTF-8 encoding and compiled with an earlier JDK. Recompiling on JDK 18 or later may cause problems.
By the way, let me remind you that char
/Character
has been legacy since Java 5, essentially broken since Java 2. As a 16-bit type, char
is physically incapable of representing most characters.
To work with individual characters, use code point integer numbers. Look for codePoint
methods added to classes including String
, StringBuilder
, Character
, etc.