I have a very strange data file which I have no idea how to loop through the keys. Here is the file:
(The file is generated from an API server. No way to change the input)
{'client': <object at 0xc0>, 'store': {'name': 'test', 'number': 7, 'modified': '2020-09-11T00:32:56Z', 'id': '0833-f780'}, 're': re.compile('^(http://mysite.tesdt.com)/(. )$')}
I am trying to extract 'number' from the data. But seems like there is no way. I have tried json.loads
, eval(data)
, or any other combinations to convert it to a native python dict. As you can see below, all these chunks of code did not work:
Try #1:
file = "file.json"
data = file.read()
parsed = json.loads(data)
print(data)
Error:
AttributeError: 'str' object has no attribute 'read'
Try #2:
with open("file.json", "r") as f:
data = f.read()
d = ast.literal_eval(data)
print(d)
Error:
{'client': <object at 0xc0>, 'store': {'name': 'test', 'number': 7, 'modified': '2020-09-11T00:32:56Z', 'id': '0833-f780'}, 're': re.compile('^(http://mysite.tesdt.com)/(. )$')}
^
SyntaxError: invalid syntax
Try #3:
with open("file.json", "r") as f:
data = f.read()
data = data.replace("'", '"')
print(data)
js = json.loads(data)
print(js)
Error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 12 (char 11)
Try #4:
with open("file.json", "r") as f:
data = f.read()
data = str(data)
print(json.dumps(data))
js = json.loads(data)
print(js)
Error:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
CodePudding user response:
That is not JSON, it looks like it's a repr
of a python dictionary. eval
would work if not for the <object at 0xc0>
portion. You could try getting rid of that and then try eval
.
Note that eval
is quite unsafe, and only acceptable if you control where the input comes from, and are sure it won't contain anything malicious.
import re
>>> data = """{'client': <object at 0xc0>, 'store': {'name': 'test', 'number': 7, 'modified': '2020-09-11T00:32:56Z', 'id': '0833-f780'}, 're': re.compile('^(http://mysite.tesdt.com)/(. )$')}"""
>>> data_cleaned = re.sub(r"(<[^>] >)", r"'\1'", data)
"{'client': '<object at 0xc0>', 'store': {'name': 'test', 'number': 7, 'modified': '2020-09-11T00:32:56Z', 'id': '0833-f780'}, 're': re.compile('^(http://mysite.tesdt.com)/(. )$')}"
The regex (<[^>] >)
matches and captures anything between <
and >
, and the re.sub
call encloses it in quotes to make it a string.
>>> d = eval(data_cleaned)
{'client': '<object at 0xc0>',
'store': {'name': 'test',
'number': 7,
'modified': '2020-09-11T00:32:56Z',
'id': '0833-f780'},
're': re.compile(r'^(http://mysite.tesdt.com)/(. )$', re.UNICODE)}
>>> d['store']['number']
7
Of course, if all you care about is the value of number
, then just do:
>>> number = [float(x) for x in re.findall(r"'number': (\d \.?\d*)", data)]
>>> number[0]
7.0
CodePudding user response:
This is quite dirty but if you really just want the value of number, and if number is enclosed between 'number:' and a comma, you could do this:
import re
with open("file.json", "r") as f:
s = f.read()
result = re.search(r"'number': (.*?),", s)
r = result.group(1)
print(r)
You might need checks for all sorts of cases e.g. "number" is not in your text or the value of "number" has a comma in it.
Does someone now how to improve the regex such that it captures the text before the next comma?