Home > Net >  Unmarshalling `time.Time` from JSON fails when escaping ' ' as `\u002b` in files but work
Unmarshalling `time.Time` from JSON fails when escaping ' ' as `\u002b` in files but work

Time:09-30

I'm unmarshalling into a struct that has a time.Time field named Foo:

type AStructWithTime struct {
    Foo time.Time `json:"foo"`
}

My expectation is, that after unmarshalling I get something like this:

var expectedStruct = AStructWithTime{
    Foo: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
}

Working Example 1: Plain JSON Objects into Structs

This works fine when working with plain json strings:

func Test_Unmarshalling_DateTime_From_String(t *testing.T) {
    jsonStrings := []string{
        "{\"foo\": \"2022-09-26T21:00:00Z\"}",           // trailing Z = UTC offset
        "{\"foo\": \"2022-09-26T21:00:00 00:00\"}",      // explicit zero offset
        "{\"foo\": \"2022-09-26T21:00:00\u002b00:00\"}", // \u002b is an escaped ' '
    }
    for _, jsonString := range jsonStrings {
        var deserializedStruct AStructWithTime
        err := json.Unmarshal([]byte(jsonString), &deserializedStruct)
        if err != nil {
            t.Fatalf("Could not unmarshal '%s': %v", jsonString, err) // doesn't happen
        }
        if deserializedStruct.Foo.Unix() != expectedStruct.Foo.Unix() {
            t.Fatal("Unmarshalling is erroneous") // doesn't happen
        }
        // works; no errors
    }
}

Working Example 2: JSON Array into Slice

It also works, if I unmarshal the same objects from a json array into a slice:

func Test_Unmarshalling_DateTime_From_Array(t *testing.T) {
    // these are just the same objects as above, just all in one array instead of as single objects/dicts
    jsonArrayString := "[{\"foo\": \"2022-09-26T21:00:00Z\"},{\"foo\": \"2022-09-26T21:00:00 00:00\"},{\"foo\": \"2022-09-26T21:00:00\u002b00:00\"}]"
    var slice []AStructWithTime // and now I need to unmarshal into a slice
    unmarshalErr := json.Unmarshal([]byte(jsonArrayString), &slice)
    if unmarshalErr != nil {
        t.Fatalf("Could not unmarshal array: %v", unmarshalErr)
    }
    for index, instance := range slice {
        if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
            t.Fatalf("Unmarshalling failed for index %v: Expected %v but got %v", index, expectedStruct.Foo, instance.Foo)
        }
    }
    // works; no errors
}

Not Working Example

Now I do the same unmarshalling with a JSON read from a file "test.json". Its content is the array from the working example above:

[
  {
    "foo": "2022-09-26T21:00:00Z"
  },
  {
    "foo": "2022-09-26T21:00:00 00:00"
  },
  {
    "foo": "2022-09-26T21:00:00\u002b00:00"
  }
]

The code is:

func Test_Unmarshalling_DateTime_From_File(t *testing.T) {
    fileName := "test.json"
    fileContent, readErr := os.ReadFile(filepath.FromSlash(fileName))
    if readErr != nil {
        t.Fatalf("Could not read file %s: %v", fileName, readErr)
    }
    if fileContent == nil {
        t.Fatalf("File %s must not be empty", fileName)
    }
    var slice []AStructWithTime
    unmarshalErr := json.Unmarshal(fileContent, &slice)
    if unmarshalErr != nil {
        // ERROR HAPPENS HERE
        // Could not unmarshal file content test.json: parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
        t.Fatalf("Could not unmarshal file content %s: %v", fileName, unmarshalErr)
    }
    for index, instance := range slice {
        if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
            t.Fatalf("Unmarshalling failed for index %v in file %s. Expected %v but got %v", index, fileName, expectedStruct.Foo, instance.Foo)
        }
    }
}

It fails because of the escaped ' '.

parsing time ""2022-09-26T21:00:00\u002b00:00"" as ""2006-01-02T15:04:05Z07:00"": cannot parse "\u002b00:00"" as "Z07:00"

Question: Why does unmarshalling the time.Time field fail when it's being read from a file but works when the same json is read from an identical string?

CodePudding user response:

I believe that this is a bug in encoding/json.

Both the JSON grammar at https://www.json.org and the IETF definition of JSON at RFC 8259, Section 7: Strings provide that a JSON string may contain Unicode escape sequences:

7. Strings

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U 0000 through U 001F).

Any character may be escaped. If the character is in the Basic Multilingual Plane (U 0000 through U FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. The hexadecimal letters A through F can be uppercase or lowercase. So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".

. . .

To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a 12-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G-clef character (U 1D11E) may be represented as "\uD834\uDD1E".


string = quotation-mark *char quotation-mark

char = unescaped /
       escape (
          %x22 /          ; "    quotation mark  U 0022
          %x5C /          ; \    reverse solidus U 005C
          %x2F /          ; /    solidus         U 002F
          %x62 /          ; b    backspace       U 0008
          %x66 /          ; f    form feed       U 000C
          %x6E /          ; n    line feed       U 000A
          %x72 /          ; r    carriage return U 000D
          %x74 /          ; t    tab             U 0009
          %x75 4HEXDIG )  ; uXXXX                U XXXX

escape = %x5C              ; \

quotation-mark = %x22      ; "

unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

The JSON document from the original post

{
  "foo": "2022-09-26T21:00:00\u002b00:00"
}   

Parses and deserializes perfectly fine in Node.js using JSON.parse().

Here's an example demonstrating the bug:

package main

import (
    "encoding/json"
    "fmt"
    "time"
)

var document []byte = []byte(`
{
  "value": "2022-09-26T21:00:00\u002b00:00"
}
`)

func main() {

    deserializeJsonAsTime()

    deserializeJsonAsString()

}

func deserializeJsonAsTime() {
    fmt.Println("")
    fmt.Println("Deserializing JSON as time.Time ...")

    type Widget struct {
        Value time.Time `json: "value"`
    }

    expected := Widget{
        Value: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
    }
    actual := Widget{}
    err := json.Unmarshal(document, &actual)

    switch {
    case err != nil:
        fmt.Println("Error deserializing JSON as time.Time")
        fmt.Println(err)
    case actual.Value != expected.Value:
        fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
    default:
        fmt.Println("Sucess")
    }

}

func deserializeJsonAsString() {
    fmt.Println("")
    fmt.Println("Deserializing JSON as string ...")

    type Widget struct {
        Value string `json: "value"`
    }

    expected := Widget{
        Value: "2022-09-26T21:00:00 00:00",
    }
    actual := Widget{}
    err := json.Unmarshal(document, &actual)

    switch {
    case err != nil:
        fmt.Println("Error deserializing JSON as string")
        fmt.Println(err)
    case actual.Value != expected.Value:
        fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
    default:
        fmt.Println("Sucess")
    }

}

When run — see https://goplay.tools/snippet/fHQQVJ8GfPp — we get:

Deserializing JSON as time.Time ...
Error deserializing JSON as time.Time
parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"

Deserializing JSON as string ...
Sucess

Since deserializing a JSON string containing Unicode escape sequences as a string yields the correct/expected result — the escape sequence being turned into the expected rune/byte sequence — the problem seemingly lies in the code that handles the deserialization to time.Time (It does not appear to deserialize to a string and then parse the string value as a time.Time.

CodePudding user response:

As Brits point out this is one issue time: UnmarshalJSON does not respect escaped unicode characters. We could solve those two errors when json.Unmarshal to the string {"value": "2022-09-26T21:00:00\u002b00:00"} in this way.

  • JSON fails when escaping ' ' as '\u002b'

    • Solution: Converting escaped unicode to utf8 through strconv.Unquote
  • cannot parse "\\u002b00:00\"" as "Z07:00"

    • Solution: parse time with this format "2006-01-02T15:04:05-07:00"
      • stdNumColonTZ // "-07:00" from src/time/format.go
      • If you want to parse TimeZone from it, time.ParseInLocation could be used.

In order to make it compatible with json.Unmarshal, we could define one new type utf8Time

type utf8Time struct {
    time.Time
}

func (t *utf8Time) UnmarshalJSON(data []byte) error {
    str, err := strconv.Unquote(string(data))
    if err != nil {
        return err
    }
    tmpT, err := time.Parse("2006-01-02T15:04:05-07:00", str)
    if err != nil {
        return err
    }
    *t = utf8Time{tmpT}
    return nil
}

func (t utf8Time) String() string {
    return t.Format("2006-01-02 15:04:05.999999999 -0700 MST")
}

Then to do the json.Unmarshal

type MyDoc struct {
    Value utf8Time `json:"value"`
}

var document = []byte(`{"value": "2022-09-26T21:00:00\u002b00:00"}`)

func main() {
    var mydoc MyDoc
    err := json.Unmarshal(document, &mydoc)
    if err != nil {
        fmt.Println(err)
    }
    fmt.Println(mydoc.Value)
}

Output

2022-09-26 21:00:00  0000  0000

  • Related