Home > Enterprise >  python3 - json.loads for a string that contains " in a value
python3 - json.loads for a string that contains " in a value

Time:11-09

I'm trying to transform a string that contains a dict to a dict object using json. But in the data contains a " example

string = '{"key1":"my"value","key2":"my"value2"}'
js = json.loads(s,strict=False)

it outputs json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 13 (char 12) as " is a delimiter and there is too much of it

What is the best way to achieve my goal ?

The solution I have found is to perform several .replace on the string to replace legit " by a pattern until only illgal " remains then replace back the pattern by the legit " After that I can use json.loads and then replace the remaining pattern by the illegal " But there must be another way

ex :

string = '{"key1":"my"value","key2":"my"value2"}'
string = string.replace('{"','__pattern_1')
string = string.replace('}"','__pattern_2')
...
...
string = string.replace('"','__pattern_42')
string = string.replace('__pattern_1','{"')
string = string.replace('__pattern_2','}"')
...
...
js = json.loads(s,strict=False)

CodePudding user response:

The variable string is not a valid JSON string. The correct string should be:

string = '{"key1":"my\\"value","key2":"my\\"value2"}'

CodePudding user response:

Problem is, that the string contains invalid json format.

String '{"key1": "my"value", "key2": "my"value2"}': value of key1 ends with "my" and additional characters value" are against the format.

You can use character escaping, valid json would look like:

{"key1": "my\"value", "key2": "my\"value2"}.

Since you are defining it as value you would then need to escape the escape characters:

string = '{"key1": "my\\"value", "key2": "my\\"value2"}'

There is a lot of educative material online on character escaping. I recommend to check it out if something is not clear

CodePudding user response:

This should work. What I am doing here is to simply replace all the expected double quotes with something else and then remove the unwanted double quotes. and then convert it back.

import re
import json

def fix_json_string(st):
    st = re.sub(r'","',"!!",st)
    st = re.sub(r'":"',"--",st)
    st = re.sub(r'{"',"{{",st)
    st = re.sub(r'"}',"}}",st)
    st = st.replace('"','')
    st = re.sub(r'}}','"}',st)
    st = re.sub(r'{{','{"',st)
    st = re.sub(r'--','":"',st)
    st = re.sub(r'!!','","',st)
    return st

broken_string = '{"key1":"my"value","key2":"my"value2"}'
fixed_string = fix_json_string(broken_string)
print(fixed_string)
js = json.dumps(eval(fixed_string))
print(js)


Output -
{"key1":"myvalue","key2":"myvalue2"} # str
{"key1": "myvalue", "key2": "myvalue2"} # converted to json
  • Related