Home > Net >  JSON with additional content at top of file
JSON with additional content at top of file

Time:06-07

I am trying to read this url into R as a JSON: https://comtrade.un.org/Data/cache/reporterAreas.json

I see that there is additional content at the top of the file, wrapping the content I am after. A sample of the file looks as follows:

{
  "more": false,
  "results": [
    {
      "id": "all",
      "text": "All"
    },
    {
      "id": "4",
      "text": "Afghanistan"
    },
    {
      "id": "8",
      "text": "Albania"
    }
  ]
}

Trying to read using:

x <- GET(url)
fromJSON(rawToChar(x$content))

doesn't work throwing error: unexpected character '<ef>'. I assume this is seeing the [.

I also tried download.file(url, file), calling fromJSON(file), but that threw the error unexpected character 'r', which I am guessing is from "results"

I assume this is just some header formatting for the JSON (apologies, I don't do much with JSON files), and there is am option for dealing with it either via GET() or fromJSON(), but I can't see anything in the docs. None of the examples that i have seen describing how to pull JSON from url have this format.

When I call class(rawToChar(x$content)) it shows as a chr vector, so I could clean that eliminating the {"more": false,"results": [ and ]}, but that seems wonky for what looks like a standard format.

If someone can show me how to import this correctly, i would welcome it. Also welcome a more useful question title which describes this issue more effectively.

CodePudding user response:

The <ef> character is the first byte of a byte-order mark translated to UTF-8. The other bytes are <bb><bf>.

When I download the file using download.file() and then decode it using jsonlite::read_json(), it gives a warning about the BOM, but appears to read the rest of the file without an error. You should try that.

  • Related