Home > database >  How to extract a value from a complex dictionary read from a file?
How to extract a value from a complex dictionary read from a file?

Time:08-31

I have a very strange data file which I have no idea how to loop through the keys. Here is the file:

(The file is generated from an API server. No way to change the input)

{'client': <object at 0xc0>, 'store': {'name': 'test', 'number': 7, 'modified': '2020-09-11T00:32:56Z', 'id': '0833-f780'}, 're': re.compile('^(http://mysite.tesdt.com)/(. )$')}

I am trying to extract 'number' from the data. But seems like there is no way. I have tried json.loads, eval(data), or any other combinations to convert it to a native python dict. As you can see below, all these chunks of code did not work:

Try #1:

file = "file.json"
data = file.read() 
parsed = json.loads(data)
print(data)

Error:

AttributeError: 'str' object has no attribute 'read'

Try #2:

with open("file.json", "r") as f:
    data = f.read()
    d = ast.literal_eval(data)
    print(d)

Error:

    {'client': <object at 0xc0>, 'store': {'name': 'test', 'number': 7, 'modified': '2020-09-11T00:32:56Z', 'id': '0833-f780'}, 're': re.compile('^(http://mysite.tesdt.com)/(. )$')}
               ^
SyntaxError: invalid syntax

Try #3:

with open("file.json", "r") as f:
    data = f.read()
    data = data.replace("'", '"')
    print(data)
    js = json.loads(data)
    print(js)

Error:

json.decoder.JSONDecodeError: Expecting value: line 1 column 12 (char 11)

Try #4:

with open("file.json", "r") as f:
    data = f.read()
    data = str(data)
    print(json.dumps(data))
    js = json.loads(data)
    print(js)

Error:

json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

CodePudding user response:

That is not JSON, it looks like it's a repr of a python dictionary. eval would work if not for the <object at 0xc0> portion. You could try getting rid of that and then try eval.

Note that eval is quite unsafe, and only acceptable if you control where the input comes from, and are sure it won't contain anything malicious.

import re

>>> data = """{'client': <object at 0xc0>, 'store': {'name': 'test', 'number': 7, 'modified': '2020-09-11T00:32:56Z', 'id': '0833-f780'}, 're': re.compile('^(http://mysite.tesdt.com)/(. )$')}"""


>>> data_cleaned = re.sub(r"(<[^>] >)", r"'\1'", data)
"{'client': '<object at 0xc0>', 'store': {'name': 'test', 'number': 7, 'modified': '2020-09-11T00:32:56Z', 'id': '0833-f780'}, 're': re.compile('^(http://mysite.tesdt.com)/(. )$')}"

The regex (<[^>] >) matches and captures anything between < and >, and the re.sub call encloses it in quotes to make it a string.

>>> d = eval(data_cleaned)
{'client': '<object at 0xc0>',
 'store': {'name': 'test',
  'number': 7,
  'modified': '2020-09-11T00:32:56Z',
  'id': '0833-f780'},
 're': re.compile(r'^(http://mysite.tesdt.com)/(. )$', re.UNICODE)}

>>> d['store']['number']
7

Of course, if all you care about is the value of number, then just do:

>>> number = [float(x) for x in re.findall(r"'number': (\d \.?\d*)", data)]
>>> number[0]
7.0

CodePudding user response:

This is quite dirty but if you really just want the value of number, and if number is enclosed between 'number:' and a comma, you could do this:

import re


with open("file.json", "r") as f:
    s = f.read()

result = re.search(r"'number': (.*?),", s)
r = result.group(1)

print(r)

You might need checks for all sorts of cases e.g. "number" is not in your text or the value of "number" has a comma in it.

Does someone now how to improve the regex such that it captures the text before the next comma?

  • Related