I am trying to read this url into R as a JSON: https://comtrade.un.org/Data/cache/reporterAreas.json
I see that there is additional content at the top of the file, wrapping the content I am after. A sample of the file looks as follows:
{
"more": false,
"results": [
{
"id": "all",
"text": "All"
},
{
"id": "4",
"text": "Afghanistan"
},
{
"id": "8",
"text": "Albania"
}
]
}
Trying to read using:
x <- GET(url)
fromJSON(rawToChar(x$content))
doesn't work throwing error: unexpected character '<ef>'
. I assume this is seeing the [
.
I also tried download.file(url, file)
, calling fromJSON(file)
, but that threw the error unexpected character 'r'
, which I am guessing is from "results"
I assume this is just some header formatting for the JSON (apologies, I don't do much with JSON files), and there is am option for dealing with it either via GET()
or fromJSON()
, but I can't see anything in the docs. None of the examples that i have seen describing how to pull JSON from url have this format.
When I call class(rawToChar(x$content))
it shows as a chr vector
, so I could clean that eliminating the {"more": false,"results": [
and ]}
, but that seems wonky for what looks like a standard format.
If someone can show me how to import this correctly, i would welcome it. Also welcome a more useful question title which describes this issue more effectively.
CodePudding user response:
The <ef>
character is the first byte of a byte-order mark translated to UTF-8. The other bytes are <bb><bf>
.
When I download the file using download.file()
and then decode it using jsonlite::read_json()
, it gives a warning about the BOM, but appears to read the rest of the file without an error. You should try that.