I am developing a webapp with spring boot with Windows 10 Pro German Edition. As IDE i am using Spring Tool Suite. When i start the application from the console with maven:
mvn clean package -Pproduction && mvn spring-boot:run -Dfile.encoding=UTF8
and i call on the String "Jürgen" string.toCharArray() and show all the characters it will print this:
length: 6
pos: 0, val: J
pos: 1, val: ³
pos: 2, val: r
pos: 3, val: g
pos: 4, val: e
pos: 5, val: n
But when i start the application within the Spring Tool Suite IDE by the Boot Dashboard (right click restart)
It will print this:
length: 6
pos: 0, val: J
pos: 1, val: ü
pos: 2, val: r
pos: 3, val: g
pos: 4, val: e
pos: 5, val: n
I want the behaviour from the STS in the console as well? But how? And why do i have the problem at all? The value is correctly displayed in the GUI. Only the output differs.
CodePudding user response:
Windows console encoding doesn't come from Java properties but is a part of Windows configuration or possibly IDE configuration. Hence the difference. The big question is whether the data is correct and just it is incorrectly displayed or the data itself somehow got corrupted. For that I suggest to convert your String to sequence of unicodes and it will show you if the data iteslf is intact or not. To do so I wrote my own util that converts String to sequesnce of Unicodes and vice-versa. The code may look like this:
private static void encodingTest() {
String testStr1 = "Jürgen";
String encoded1 = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(testStr1);
String restored = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(encoded1);
System.out.println(testStr1 "\n" encoded1 "\n" restored);
}
The output (for intact data) looks like this:
Jürgen
\u004a\u00fc\u0072\u0067\u0065\u006e
Jürgen
Class StringUnicodeEncoderDecoder
is part of Open Source MgntUtils library written and maintained by me. Here is its JavaDoc. The MgntUtils library if you want to use it can be found as Maven artifact on Maven central or On Github (with source code and Javadoc). Also you can get similar result with code like this as well
System.out.println(arg.chars().mapToObj(c -> String.format("\\ux", c)).collect(Collectors.joining()));
CodePudding user response:
You face a mojibake case (example in Python for its universal intelligibility):
print( 'Jürgen'.encode('cp1252').decode('cp850'))
J³rgen
I'd guess that your command prompt default code page is 850
. Check this using chcp.com
as follows:
chcp
Active code page: 850
Solution: Change code page to cp1252
before running mvn
as follows:
chcp 1252
mvn clean package -Pproduction && mvn spring-boot:run -Dfile.encoding=UTF8