I'm writing a custom json compresser. It is going to be reading numbers in all formats. How do I print the values of the json in the format, it is given, with json.load(). I would also want to preserve the type.
Example of a file it would have to read would be:
{"a":301, "b":301.0, "c":3.01E2, "d":"301", "e":"301.0", "f":"3.01E2"}
I would also want it to be able to distinquish between 1, 1.0 and true
When I do a basic for loop to print the values and their types with json.load(), it prints out
301 int
301.0 float
301.0 float
301 str
301.0 str
3.01E2 str
And yes, I understand that scientific notations are floats
Excpected output would be
301 int
301.0 float
3.01E2 float
301 str
301.0 str
3.01E2 str
CodePudding user response:
So IIUC you want to keep the formatting of the json even if the value is given as float. I think the only way to do this is to change the type in your json i.e. adding quotes around float elements.
This can be done with regex:
import json
import re
data = """{"a":301, "b":301.0, "c":3.01E2, "d":"301", "e":"301.0", "f":"3.01E2", "g": true, "h":"hello"}"""
# the cricial part: enclosing float/int in quotes:
pattern = re.compile(r'(?<=:)\s*([ -]?\d (?:\.\d*(?:E-?\d )?)?)\b')
data_str = pattern.sub(r'"\1"', data)
val_dict = json.loads(data) # the values as normally read by the json module
type_dict = {k: type(v) for k,v in val_dict.items()} # their types
repr_dict = json.loads(data_str) # the representations (everything is a sting there)
# using Pandas for pretty formatting
import pandas as pd
df = pd.DataFrame([val_dict, type_dict, repr_dict], index=["Value", "Type", "Repr."]).T
Output:
Value Type Repr.
a 301 <class 'int'> 301
b 301.0 <class 'float'> 301.0
c 301.0 <class 'float'> 3.01E2
d 301 <class 'str'> 301
e 301.0 <class 'str'> 301.0
f 3.01E2 <class 'str'> 3.01E2
g True <class 'bool'> True
h hello <class 'str'> hello
So here the details of the regex:
([ -]?\d (?:\.\d*(?:E-?\d )?)?)
this is our matching group, consisting of:[ -]?
optional leading or -\d
one or more digits, followed by (optionally):(?:\.\d*(?:E-?\d )?)?
: non capturing group made of\.
a dot\d*
zero or more digits- (optionally) an
E
with an (optional) minus-
followed by one or more digits\d
\b
specifie a word boundary (so that the match doesn't cut a series of digits)(?<=:)
is a lookbehind, ensuring the expression is directly preceeded by:
(we don't add quotes around existing strings)\s*
any white character before the expression is ignored/removed
\1
is a back reference to our (1st) group. So we replace the whole match with "\1"
Edit: slightly changed the regex to replace numbers directly following :
and taking leading /- into account