I'm unmarshalling into a struct that has a time.Time
field named Foo:
type AStructWithTime struct {
Foo time.Time `json:"foo"`
}
My expectation is, that after unmarshalling I get something like this:
var expectedStruct = AStructWithTime{
Foo: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
}
Working Example 1: Plain JSON Objects into Structs
This works fine when working with plain json strings:
func Test_Unmarshalling_DateTime_From_String(t *testing.T) {
jsonStrings := []string{
"{\"foo\": \"2022-09-26T21:00:00Z\"}", // trailing Z = UTC offset
"{\"foo\": \"2022-09-26T21:00:00 00:00\"}", // explicit zero offset
"{\"foo\": \"2022-09-26T21:00:00\u002b00:00\"}", // \u002b is an escaped ' '
}
for _, jsonString := range jsonStrings {
var deserializedStruct AStructWithTime
err := json.Unmarshal([]byte(jsonString), &deserializedStruct)
if err != nil {
t.Fatalf("Could not unmarshal '%s': %v", jsonString, err) // doesn't happen
}
if deserializedStruct.Foo.Unix() != expectedStruct.Foo.Unix() {
t.Fatal("Unmarshalling is erroneous") // doesn't happen
}
// works; no errors
}
}
Working Example 2: JSON Array into Slice
It also works, if I unmarshal the same objects from a json array into a slice:
func Test_Unmarshalling_DateTime_From_Array(t *testing.T) {
// these are just the same objects as above, just all in one array instead of as single objects/dicts
jsonArrayString := "[{\"foo\": \"2022-09-26T21:00:00Z\"},{\"foo\": \"2022-09-26T21:00:00 00:00\"},{\"foo\": \"2022-09-26T21:00:00\u002b00:00\"}]"
var slice []AStructWithTime // and now I need to unmarshal into a slice
unmarshalErr := json.Unmarshal([]byte(jsonArrayString), &slice)
if unmarshalErr != nil {
t.Fatalf("Could not unmarshal array: %v", unmarshalErr)
}
for index, instance := range slice {
if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
t.Fatalf("Unmarshalling failed for index %v: Expected %v but got %v", index, expectedStruct.Foo, instance.Foo)
}
}
// works; no errors
}
Not Working Example
Now I do the same unmarshalling with a JSON read from a file "test.json". Its content is the array from the working example above:
[
{
"foo": "2022-09-26T21:00:00Z"
},
{
"foo": "2022-09-26T21:00:00 00:00"
},
{
"foo": "2022-09-26T21:00:00\u002b00:00"
}
]
The code is:
func Test_Unmarshalling_DateTime_From_File(t *testing.T) {
fileName := "test.json"
fileContent, readErr := os.ReadFile(filepath.FromSlash(fileName))
if readErr != nil {
t.Fatalf("Could not read file %s: %v", fileName, readErr)
}
if fileContent == nil {
t.Fatalf("File %s must not be empty", fileName)
}
var slice []AStructWithTime
unmarshalErr := json.Unmarshal(fileContent, &slice)
if unmarshalErr != nil {
// ERROR HAPPENS HERE
// Could not unmarshal file content test.json: parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
t.Fatalf("Could not unmarshal file content %s: %v", fileName, unmarshalErr)
}
for index, instance := range slice {
if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
t.Fatalf("Unmarshalling failed for index %v in file %s. Expected %v but got %v", index, fileName, expectedStruct.Foo, instance.Foo)
}
}
}
It fails because of the escaped ' '.
parsing time ""2022-09-26T21:00:00\u002b00:00"" as ""2006-01-02T15:04:05Z07:00"": cannot parse "\u002b00:00"" as "Z07:00"
Question: Why does unmarshalling the time.Time field fail when it's being read from a file but works when the same json is read from an identical string?
CodePudding user response:
I believe that this is a bug in encoding/json
.
Both the JSON grammar at https://www.json.org and the IETF definition of JSON at RFC 8259, Section 7: Strings provide that a JSON string may contain Unicode escape sequences:
7. Strings
The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U 0000 through U 001F).
Any character may be escaped. If the character is in the Basic Multilingual Plane (U 0000 through U FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. The hexadecimal letters A through F can be uppercase or lowercase. So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".
. . .
To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a 12-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G-clef character (U 1D11E) may be represented as "\uD834\uDD1E".
string = quotation-mark *char quotation-mark char = unescaped / escape ( %x22 / ; " quotation mark U 0022 %x5C / ; \ reverse solidus U 005C %x2F / ; / solidus U 002F %x62 / ; b backspace U 0008 %x66 / ; f form feed U 000C %x6E / ; n line feed U 000A %x72 / ; r carriage return U 000D %x74 / ; t tab U 0009 %x75 4HEXDIG ) ; uXXXX U XXXX escape = %x5C ; \ quotation-mark = %x22 ; " unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
The JSON document from the original post
{
"foo": "2022-09-26T21:00:00\u002b00:00"
}
Parses and deserializes perfectly fine in Node.js using JSON.parse()
.
Here's an example demonstrating the bug:
package main
import (
"encoding/json"
"fmt"
"time"
)
var document []byte = []byte(`
{
"value": "2022-09-26T21:00:00\u002b00:00"
}
`)
func main() {
deserializeJsonAsTime()
deserializeJsonAsString()
}
func deserializeJsonAsTime() {
fmt.Println("")
fmt.Println("Deserializing JSON as time.Time ...")
type Widget struct {
Value time.Time `json: "value"`
}
expected := Widget{
Value: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
}
actual := Widget{}
err := json.Unmarshal(document, &actual)
switch {
case err != nil:
fmt.Println("Error deserializing JSON as time.Time")
fmt.Println(err)
case actual.Value != expected.Value:
fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
default:
fmt.Println("Sucess")
}
}
func deserializeJsonAsString() {
fmt.Println("")
fmt.Println("Deserializing JSON as string ...")
type Widget struct {
Value string `json: "value"`
}
expected := Widget{
Value: "2022-09-26T21:00:00 00:00",
}
actual := Widget{}
err := json.Unmarshal(document, &actual)
switch {
case err != nil:
fmt.Println("Error deserializing JSON as string")
fmt.Println(err)
case actual.Value != expected.Value:
fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
default:
fmt.Println("Sucess")
}
}
When run — see https://goplay.tools/snippet/fHQQVJ8GfPp — we get:
Deserializing JSON as time.Time ...
Error deserializing JSON as time.Time
parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
Deserializing JSON as string ...
Sucess
Since deserializing a JSON string containing Unicode escape sequences as a string
yields the correct/expected result — the escape sequence being turned into the expected rune/byte sequence — the problem seemingly lies in the code that handles the deserialization to time.Time
(It does not appear to deserialize to a string and then parse the string value as a time.Time
.
CodePudding user response:
As Brits point out this is one issue time: UnmarshalJSON does not respect escaped unicode characters. We could solve those two errors when json.Unmarshal
to the string {"value": "2022-09-26T21:00:00\u002b00:00"}
in this way.
JSON fails when escaping ' ' as '\u002b'
- Solution: Converting escaped unicode to utf8 through
strconv.Unquote
- Solution: Converting escaped unicode to utf8 through
cannot parse "\\u002b00:00\"" as "Z07:00"
- Solution: parse time with this format
"2006-01-02T15:04:05-07:00"
stdNumColonTZ // "-07:00"
fromsrc/time/format.go
- If you want to parse TimeZone from it,
time.ParseInLocation
could be used.
- Solution: parse time with this format
In order to make it compatible with json.Unmarshal
, we could define one new type utf8Time
type utf8Time struct {
time.Time
}
func (t *utf8Time) UnmarshalJSON(data []byte) error {
str, err := strconv.Unquote(string(data))
if err != nil {
return err
}
tmpT, err := time.Parse("2006-01-02T15:04:05-07:00", str)
if err != nil {
return err
}
*t = utf8Time{tmpT}
return nil
}
func (t utf8Time) String() string {
return t.Format("2006-01-02 15:04:05.999999999 -0700 MST")
}
Then to do the json.Unmarshal
type MyDoc struct {
Value utf8Time `json:"value"`
}
var document = []byte(`{"value": "2022-09-26T21:00:00\u002b00:00"}`)
func main() {
var mydoc MyDoc
err := json.Unmarshal(document, &mydoc)
if err != nil {
fmt.Println(err)
}
fmt.Println(mydoc.Value)
}
Output
2022-09-26 21:00:00 0000 0000