I have to deal with putative JSON from a lot of different sources, and a lot of the time it seems that there is a problem with the data itself. I suspect that it sometimes isn't intended to be JSON at all; but a lot of the time it comes from a buggy tool, or it was written by hand for a quick test and has some typo in it.
Rather than ask about a specific error, I'm looking for a checklist: based on the error message, what is the most likely cause? What information is present in these error messages, and how can I use it to locate the problem in the data? Assume for these purposes that I can save the data to a temporary file for analysis, if it didn't already come from a file.
CodePudding user response:
Foreword
The only exception explicitly raised by the decoding code is json.JSONDecodeError
, so the exception type does not help diagnose problems. What's interesting is the associated message. However, it is possible that decoding bytes to text fails, before JSON decoding can be attempted. That is a separate issue beyond the scope of this post.
It's worth noting here that the JSON format documentation uses different terminology from Python. In particular, a portion of valid JSON data enclosed in {}
is an object (not "dict") in JSON parlance, and a portion enclosed in []
is an array (not "list"). I will use JSON terminology when talking about the file contents, and Python terminology when talking about the parsed result or about data created directly by Python code.
As a general hint: use a dedicated JSON viewer to examine the file, or at least use a text editor that has some functionality to "balance" brackets (i.e., given that the insertion pointer is currently at a {
, it will automatically find the matching }
).
Not JSON
An error message saying Expecting value
is a strong indication that the data is not intended to be JSON formatted at all. Carefully note the line and column position of the error for more information:
if the error occurs at line 1, column 1, it will be necessary to inspect the beginning of the file. It could be that the data is actually empty. If it starts with
<
, then that of course suggests XML rather than JSON.
Otherwise, there could be some padding preceding actual JSON content. Sometimes, this is to implement a security restriction in a web environment; in other cases it's to work around a different restriction. The latter case is called JSONP (JSON with Padding). Either way, it will be necessary to inspect the data to figure out how much should be trimmed from the beginning (and possibly also the end) before parsing.other positions might be because the data is actually the
repr
of some native Python data structure. Data like this can often be parsed usingast.literal_eval
, but it should not be considered a practical serialization format - it doesn't interoperate well with code not written in Python, and usingrepr
can easily produce data that can't be recovered this way (or in any practical way).
Note some common differences between Python's native object representations and the JSON format, to help diagnose the problem:
JSON uses only double quotes to surround strings; Python may also use single quotes, as well as triple-single (
'''example'''
) or triple-double ("""example"""
) quotes.JSON uses lowercase
true
andfalse
rather thanTrue
andFalse
to represent booleans. It usesnull
rather thanNone
as a special "there is nothing here" value. It usesInfinity
andNaN
to represent special floating-point values, rather thaninf
andnan
.
One subtlety: Expecting value
can also indicate a trailing comma in an array or object. JSON syntax does not allow a trailing comma after listing elements or key-value pairs, although Python does. Although the comma is "extra", this will be reported as something missing (the next element or key-value pair) rather than something extraneous (the comma).
An error message saying Extra data
indicates that there is more text after the end of the JSON data.
If the error occurs at line 2 column 1, this strongly suggests that the data is in fact in JSONL ("JSON Lines") format - a related format wherein each line of the input is a separate JSON entity (typically an object). Handling this is trivial: just iterate over lines of the input and parse each separately, and put the results in a list. For example, use a list comprehension:
[json.loads(line) for line in open_json_file]
. See Loading JSONL file as JSON objects for more.Otherwise, the extra data could be part of JSONP padding. It can be removed before parsing; or else use the
.raw_decode
method of theJSONDecoder
class:
>>> import json
>>> example = '{"key": "value"} extra'
>>> json.loads(example) # breaks because of the extra data:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 18 (char 17)
>>> parsed, size = json.JSONDecoder().raw_decode(example)
>>> parsed
{'key': 'value'}
>>> size # amount of text that was parsed.
16
Invalid string literals
Error messages saying any of:
Invalid \\uXXXX escape
Invalid \\escape
Unterminated string starting at
Invalid control character
suggest that a string in the data isn't properly formatted, most likely due to a badly written escape code.
JSON strings can't contain control codes in strict mode (the default for parsing), so e.g. a newline must be encoded with \n
. Note that the data must actually contain a backslash; when viewing a representation of the JSON data as a string, that backslash would then be doubled up (but not when, say, print
ing the string).
JSON doesn't accept Python's \x
or \U
escapes, only \u
. To represent characters outside the BMP, use a surrogate pair:
>>> json.loads('"\\ud808\\udf45"') # encodes Unicode code point 0x12345 as a surrogate pair
'