Home > OS >  How can I determine the width of a Unicode character
How can I determine the width of a Unicode character

Time:02-23

me and a friend are programming our own console in java, but we have Problems to adjust the lines correctly, because of the width of the unicode characters which can not be determined exactly. This leads to the problem that not only the line of the unicode, but also following lines are shifted.

Is there a way to determine the width of the unicodes?

Screenshots of the problem can be found bellow.

This is how it should look: https://abload.de/img/richtigslkmg.jpeg

This is an example in Terminal: https://abload.de/img/terminal7dj5o.jpeg

This is an example in PowerShell: https://abload.de/img/powershelln7je0.jpeg

This is an example in Visual Studio Code: https://abload.de/img/visualstudiocode4xkuo.jpeg

This is an example in Putty: https://abload.de/img/putty0ujsk.png

CodePudding user response:

There is method Character::charCount(int codePoint) to help define the "width" of a Unicode symbol:

Determines the number of char values needed to represent the specified character (Unicode code point). If the specified character is equal to or greater than 0x10000, then the method returns 2. Otherwise, the method returns 1.
This method doesn't validate the specified character to be a valid Unicode code point. The caller must validate the character value using isValidCodePoint if necessary.

Parameters:
codePoint - the character (Unicode code point) to be tested.
Returns:
2 if the character is a valid supplementary character; 1 otherwise.
Since:
1.5

CodePudding user response:

Your Question neglected to show any code. So I can only guess what you are doing and what might be the problem.

Avoid char

I am guessing that your goal is to append a certain number of NUMBER SIGN characters as needed to make a fixed-length row of text.

I am guessing the problem is that you are using the legacy char type, or its wrapper class Character. The char type has been essentially broken since Java 2. As a 16-bit value, char is physically incapable of representing most characters.

Use code point numbers

Instead, use code point integer numbers when working with individual characters. A code point is the number permanently assigned to each of the over 140,000 characters defined in Unicode.

A variety of code point related methods have been added to various classes in Java 5 : String, StringBuilder, Character, etc.

Here we use String#codePoints to get an IntStream of code points, one element for each character in the source. And we use StringBuilder#appendCodePoint to collect the code points for our final result string.

final int targetLength = 10;
final int fillerCodePoint = "#".codePointAt( 0 ); // Annoying zero-based index counting.
String input = "           
  • Related