I'm trying to save text files using a Chinese character encoder GB2312. According to this document, GB2312 supports Cyrillic characters. Unfortunately, java can't save Cyrillic characters in GB2312 encoding. I used the below code.
Question: Does java's encoder is not fully supports all GB2312 supported characters? How can I see all supported characters in specific encoder?
Files.write(Path.of("output_gb2312.txt"), List.of("АБВГДЕЁЖЗИЙКЛМНОӨПРСТУҮФХЦЧШЩЪЫЬЭЮЯ"), Charset.forName("GB2312"));
Output:
Exception in thread "main" java.nio.charset.UnmappableCharacterException: Input length = 1
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:275)
at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:307)
at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:132)
at java.base/java.io.OutputStreamWriter.write(OutputStreamWriter.java:205)
at java.base/java.io.BufferedWriter.flushBuffer(BufferedWriter.java:120)
at java.base/java.io.BufferedWriter.close(BufferedWriter.java:268)
at java.base/java.nio.file.Files.write(Files.java:3587)
CodePudding user response:
The characters Ө (U 04E8 CYRILLIC CAPITAL LETTER BARRED O) and Ү (U 04AE CYRILLIC CAPITAL LETTER STRAIGHT U) aren't part of the GB 2312 character set. Remove them from your string and your code will work.
Files.write(Path.of("output_gb2312.txt"), List.of("АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ"), Charset.forName("GB2312"));
Alternatively, GB 2312's replacement, GB 18030 will handle those two characters (And the rest of Unicode) if your java installation supports it.
Or you can set things up to replace unmappable characters instead of throwing an exception, though it's more cumbersome than using Files.write()
:
import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
public class Demo {
public static void main(String[] args) {
Charset gb2312 = Charset.forName("GB2312");
// REPLACE is already the default but let's be explicit
CharsetEncoder enc = gb2312.newEncoder().onUnmappableCharacter(CodingErrorAction.REPLACE);
try (FileOutputStream f = new FileOutputStream("output_gb2312.txt");
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(f, enc))) {
out.write("АБВГДЕЁЖЗИЙКЛМНОӨПРСТУҮФХЦЧШЩЪЫЬЭЮЯ");
out.newLine();
} catch (IOException e) {
System.err.println(e);
System.exit(1);
}
}
}
See the CodingErrorAction
documentation for the available options of how to handle an unmappable character.