To make Golang rune to utf-8 result same as js string.fromCharCode-CodePudding

var int32s = []int32{
  8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26,
}

fmt.Println("word: ", string(int32s))

let int32s = [8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26]
str = String.fromCharCode.apply(null, int32s);
console.log("word: "   String.fromCharCode.apply(null, int32s))

2 results above are not the same for some empty characters is there any solution for modify go code to generate same result to the js one?

CodePudding user response：

To cite the docs on String.fromCharCode:

The static String.fromCharCode() method returns a string created from the specified sequence of UTF-16 code units.

So each number in your int32s array is interpreted as a 16-bit integer providing a Unicode code unit, so that the whole sequence is interpreted as a series of code units forming an UTF-16-encoded string.
I'd stress the last point because judging from the naming of the variable—int32s,—whoever is the author of the JS code, they appear to have incorrect idea about what is happening there.

Now back to the Go counterpart. Go does not have built-in support for UTF-16 encodings; its strings are normally encoded using UTF-8 (though they are not required to, but let's not digress), and also Go provides the rune data type which is an alias to int32. A rune is a Unicode code point, that is, a number which is able to contain a complete Unicode character. (I'll get back to this fact and its relation to the JS code in a moment.)

Now, what's wrong with your string(int32s) is that it interpets your slice of int32s in the same way as []rune (remember that a rune is an alias to int32), so it takes each number in the slice to represent a single Unicode character and produces a string of them. (This string is internally encoded as UTF-8 but this fact is not really relevant to the problem.)

In other words, the difference is this:

The JS code interprets the array as a sequence of 16-bit values representing an UTF-16-encoded string and converts it to some internal string representation.
The Go code interprets the slice as a sequence of 32-bit Unicode code points and produces a string containing these code points.

The Go standard library produces a package to deal with UTF-16 encoding: encoding/utf16, and we can use it to do what the JS code codes—to decode an UTF-16-encoded string into a sequence of Unicode code points, which we can then convert to a Go string:

package main

import (
    "fmt"
    "unicode/utf16"
)

func main() {
    var uint16s = []uint16{
        8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26,
    }

    runes := utf16.Decode(uint16s)

    fmt.Println("word: ", string(runes))
}

Playground.

(Note that I've change the type of the slice to []unit16 and renamed it accordingly. Also, I've decoded the source slice to an explicitly named variable; this is done for clarity—to highlight what's happening.)

This code produces the same gibberish as the JS code does in the Firefox console.