If I run this Go code:
package main
import (
"encoding/json"
"os"
)
func main() {
json.NewEncoder(os.Stdout).Encode("\xa1") // "\ufffd"
}
I lose data, since once the Unicode replacement is done, I can no longer get back the original value. Compare with this Python code:
import json
a = '\xa1'
b = json.dumps(a) # "\u00a1"
print(json.loads(b) == a) # True
no replacement is done, so no data is lost. In addition, the resultant JSON is still valid. Does Go have some method to encode JSON string with escaping instead of replacement?
CodePudding user response:
This example is a false equivalence. The '\xa1'
is a valid Unicode string in Python, it's just one possible representation like '\u00a1'
or '\U000000a1'
or chr(0xa1)
or '\N{INVERTED EXCLAMATION MARK}'
or '¡'
or ...
The equivalent in Python code would be:
>>> print(json.dumps(b'\xa1'.decode(errors='replace')))
"\ufffd"
Which is also printing an ascii representation of the coerced REPLACEMENT CHARACTER on stdout, the same as in Go.
CodePudding user response:
This is because "\xa1"
is not a valid Unicode string. It contains the byte 0xa1
, which is not valid (not valid by itself). The not valid byte gets replaced with U FFFD, which is the “replacement character”—used when the input is invalid.
If you want to encode the Unicode character U 00A1, write it as "\u00a1"
. If you want to make arbitrary data go round-trip through JSON, you will have to represent it a different way (like base64 encoding it, for example).
Python just works differently—in Python, the \xa1
escape sequence is U 00A1. Again, in Go, \xa1
is the byte 0xa1
, which is not a valid Unicode string by itself and cannot be encoded as a JSON string.