Home > Back-end >  How to convert codepoint of one charset to another in Java?
How to convert codepoint of one charset to another in Java?

Time:11-12

I am trying to convert codepoints from one charset to another in Java.

For example character ř is 248 in windows-1250, 345 in unicode.

So I have source charset and source codepoint and target charset and want to calculate target codepoint.

This may sound easy as windows-1250 is single byte, but I want it to work on any charset, like GB2312.

I guess it can be done somehow with Charset class, but it seems that it only converts bytes, not actual code points.

Charset sourceCharset = Charset.forName("GB2312");                
int sourceCodePoint = 45257; //吧 chinese character
Charset targetCharset = Charset.forName("UTF-8");                
int targetCodePoint = ...; //???

I checked Charset class for methods codepoint related, but there's only decode and encode, which works with bytes. I tried googling something related but without success.

Thanks in advance for any help.

CodePudding user response:

At least in Java there is no notion of codepoints for character sets other than Unicode. You have to convert the integer to byte array and then to unicode.

    Charset sourceCharset = Charset.forName("windows-1250");                
    int sourceCodePoint = 248; // ř
    byte[] bytes = {(byte)sourceCodePoint};
    String targetString = new String(bytes, sourceCharset);
    int targetCodePoint = targetString.codePointAt(0);
    System.out.println("targetString = "   targetString);
    System.out.println("targetCodePoint = "   targetCodePoint);

output:

targetString = ř
targetCodePoint = 345

Chinese characters in GB2312 are represented by 2 bytes, so you need to store them in a byte array of length 2.

    Charset sourceCharset = Charset.forName("GB2312");                
    int sourceCodePoint = 45257; // 吧 chinese character
    byte[] bytes = ByteBuffer.allocate(2).putShort((short)sourceCodePoint).array();
    String targetString = new String(bytes, sourceCharset);
    int targetCodePoint = targetString.codePointAt(0);
    System.out.println("targetString = "   targetString);
    System.out.println("targetCodePoint = "   targetCodePoint);

output:

targetString = 吧
targetCodePoint = 21543
  • Related