Home > OS >  How to parse invalid JSON contianing invalid number
How to parse invalid JSON contianing invalid number

Time:02-11

I work with a legacy customer who sends me webhook events. Sometimes their system sends me a value that looks like this

[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]

I am using python's json.loads to parse the data sent to me. Here the ts is an invalid number and python gives json.decoder.JSONDecodeError whenever I try to parse this string.

It is okay with me to get None in ts field if I can not parse it.

What would be a smart (& possibly generic) way to solve this problem?

CodePudding user response:

This may not be so generic, but you can try using yaml to load:

import yaml

s = '[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074","ts":16XX445656000}]'
yaml.safe_load(s)

Output:

[{'id': 'LXKhRA3RHtaVBhnczVRJLdr',
  'ecc': '0X6',
  'cph': 'X1X4X77074',
  'ts': '16XX445656000'}]

CodePudding user response:

If the problem is always in the ts key, and this value is always a string of numbers and letters, you could just remove it before trying to parse:

import re

jstr = """[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]"""

jstr_sanitized = re.sub(r',?\s*\"ts\":[A-Z0-9] ', "", jstr)
jobj = json.loads(jstr_sanitized)

# [{'id': 'LXKhRA3RHtaVBhnczVRJLdr', 'ecc': '0X6', 'cph': 'X1X4X77074'}]

Regex explanation (try online):

,?\s*\"ts\":[A-Z0-9] 

,?                      Zero or one commas
  \s*                   Any number of whitespace characters
     \"ts\":            Literally "ts":
            [A-Z0-9]    One or more uppercase letters or numbers

Alternatively, you could catch the JSONDecodeError and look at its pos attribute for the offending character. Then, you could either remove just that character and try again, or look for the next space, comma, or bracket and remove characters until that point before you try again.

jstr = """[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]"""
while True:
    try:
        jobj = json.loads(jstr)
        break
    except json.JSONDecodeError as ex:
        jstr = jstr[:ex.pos]   jstr[ex.pos 1:]

This mangles the output so that the ts key is now a valid integer (after removing the Xs) but since you don't care about that anyway, it should be fine:

[{'id': 'LXKhRA3RHtaVBhnczVRJLdr',
  'ecc': '0X6',
  'cph': 'X1X4X77074',
  'ts': 16445656000}]

Since you'd end up repeatedly re-parsing the initial valid part, this is probably not a great idea if you have a huge json string, or there are lots of places that could throw an error, but it should be fine for the kind of example you have shown.

  • Related