I am trying to parse what should be JSON into a Python dict. However, the JSON file I am working with is not valid JSON as often there will be missing quotes around the key-value pairs.
HJSON seems to be what I'm looking for, however I am finding it errors if I try to pass any value other than null
or any integer.
Playing around with it using some 'JSON' values I have to work with:
import hjson
# EXAMPLE 1
working = hjson.loads('{ date_closed: null }') # <- THIS WORKS!
print(working)
OrderedDict([('date_closed', None)])
# EXAMPLE 2
works_too = hjson.loads('{ date_closed: 42 }') # <- THIS WORKS!
print(works_too)
OrderedDict([('date_closed', 42)])
# EXAMPLE 3
not_working = hjson.loads('{ date_closed: yes }') # <- ERRORS!
~/hjson/decoder.py in scanKeyName(s, end, encoding, strict)
278
279 if ch == '':
--> 280 raise HjsonDecodeError("Bad key name (eof)", s, end);
281 elif ch == ':':
282 if begin == end:
HjsonDecodeError: Bad key name (eof): line 1 column 21 (char 20)
# EXAMPLE 4
# Using different key name
also_not_working = hjson.loads('{ date_opened: yes }') # <- ERRORS with identical error message as above
# Different value name, showing it's not a 'key' error but a 'value' error
this_works = hjson.loads('{ date_opened: null }') # <- THIS WORKS!
print(this_works)
OrderedDict([('date_opened', None)])
# EXAMPLE 5
doesnt_work = hjson.loads('{ date_opened: None }') # <- ERRORS with identical error message as above
The error message seems incorrect. It is not the
key name
that's problematic (since the same key will sometimes work), but rather thevalue name
.The only values that seem able to be parsed by HJSON are integers (value
42
works) andnull
values.
What am I missing here?
CodePudding user response:
I was just fiddling around with this and having a look at the HJSON spec, and based on the examples under there (also under the try section), and believe I figured it out. It's not clearly explained as such, and someone can correct me if I'm wrong, but it looks like HJSON requires the opening and closing braces, {
and }
to be on separate lines; at least, that's what I figured the Python implementation adheres to, at any rate. For example, here's a straightforward usage which I was able to confirm seems to parse without issues:
print(hjson.loads('''
{
testing_123: hello world
}
'''))
# now it works! prints out:
# OrderedDict([('testing_123', 'hello world')])
So in your case, I suppose the simplest way to fix it (that is, if you didn't want to put the braces on separate lines manually) would be to create a wrapper function loads
, defined as follows:
import hjson
def loads(string, decoder=hjson.loads):
return decoder(string.replace('{', '{\n').replace('}', '\n}'))
And now, I'm able to confirm that both the cases above now seem to parse as originally expected:
working_now = loads('{ date_closed: yes }')
print(working_now)
also_working = loads('{ date_opened: yes }') # <- ERRORS with identical error message as above
print(also_working)
Out:
OrderedDict([('date_closed', 'yes')])
OrderedDict([('date_opened', 'yes')])