Home > database >  Get the width of Chinese strings correctly
Get the width of Chinese strings correctly

Time:10-14

I want to make a border around the text 这是一个测试, but I cannot get the actual width of it. With English text, it does work perfectly.

Screenshot

Here is my analysis:

len tells me this:

这是一个测试 18
aaaaaaaaa 10
つのだ☆HIRO 16
aaaaaaaaaa 10

runewidth.StringWidth tells me this:

这是一个测试 12
aaaaaaaaa 10
つのだ☆HIRO 11
aaaaaaaaaa 10
func main() {
    fmt.Println("这是一个测试 |")
    fmt.Println("aaaaaaaaaa | 10*a")
    fmt.Println()
    fmt.Println("这是一个测试 |")
    fmt.Println("aaaaaaaaa | 9*a")
    fmt.Println()
    fmt.Println("Both are not equal to the Chinese text.")
    fmt.Println("The (pipe) lines are not under each other.")
}

enter image description here

Question:

How can I get my box (first screenshot) to appear correctly?

CodePudding user response:

Unicode characters (like Chinese characters) in Golang take 3 bytes, while ASCII only takes 1 byte. That's by design.

If you wish to check the actual string size of unicode character, use unicode/utf8 built-in package.

fmt.Printf("String: %s\nLength: %d\nRune Length: %d\n", c, len(c), utf8.RuneCountInString(c))
// String: 这是一个测试
// Length: 18
// Rune Length: 6

More basic way to count is by using for loop.

count := 0
for range "这是一个测试" {
    count  
}
fmt.Printf("Count=%d\n", count)
// Count=6

About the pretty print of Chinese and English strings in tabular format, there seems to be no direct way. Nor the tabwriter works in this case. A small hack-around this is to use csv writer as follows:

data := [][]string{
    {"这是一个测试", "|"},
    {"aaaaaaaaaa", "|"},
    {"つのだ☆HIRO", "|"},
    {"aaaaaaaaaa", "|"},
}

w := csv.NewWriter(os.Stdout)
defer w.Flush()
w.Comma = '\t'

for _, row := range data {
    w.Write(row)
}

This should print data as expected. Unfortunately, StackOverflow isn't printing the same format as I see in terminal. But Playground to our rescue. Click Here

Note: This works for strings with rune size close enough to one another. For lengthier strings, you'd need more work-around.

CodePudding user response:

Your problem is (as mkopriva points out in comments) a display issue, not amenable to being resolved by any sort of counting trick.

We have the same problem when we display variable-pitch, or proportional, text, vs monospace text, in English. That is, compare:

mmmm, tasty
iiii, tasty?

with:

    mmmm, tasty
    iiii, tasty?

(assuming you use a browser to read this answer!). We don't have to print Chinese characters, or even leave simple ASCII to have the problem!

What you need is a monospaced display font for your Chinese text, or perhaps some software to typeset it in tabular form, and how you get that is ... another question entirely.

  • Related