Home > database >  Why does float() not "unstring" a string formatted list element?
Why does float() not "unstring" a string formatted list element?

Time:02-10

I need to parse a comma delimited string that is part of a dict into values. I receive the data originally as a (huge) JSON formatted string and am loading it into a dict with json.loads().

Some of the values split from the string will always be floats and I just want to float() those, while others can be either strings, null, or empty and will need to be treated separately (not the topic of this question).

Bizarrely, after splitting the string, the resulting list appears to contain some non-float()-able version of a string.

Consider this small python3 example code

# construct the sample dict
var = {}
var['abc'] = '"123.456","zzz"'

# JSONify it
var_json = json.dumps(var)

print("var_json: %s" % json.loads(var_json))

# var_json now exemplifies the input data
# deJSONify it:
data_in = json.loads(var_json)

a = data_in['abc'].split(",")
print(a[0])
print("this works:", float("123.456"))
print("this borks:", float(a[0]))

This results in the following output:

var_json: {'abc': '"123.456","456"'}
"123.456"
this works: 123.456
Traceback (most recent call last):
  File "./test.py", line 26, in <module>
    print("this borks:", float(a[0]))
ValueError: could not convert string to float: '"123.456"'

So: Clearly, to python the value in the list resulting from the split is a string (has double quotes around it in the output). But using float() on that string doesn't work.

Changing that last line to manually replace the quotes works:

print(float(a[0].replace("\"", "")))

So it looks like a[0] is in fact a string containing double quotes.

The same error occurs even without the json.dumps/loads roundtrip, e.g. just accessing the split list from the dict directly:

print("This also borks: ", float(var['abc'].split(",")[0]))

Why does float() not "unstring" what very clearly is a string and a valid float conversion input? How can I avoid that .replace() call?

CodePudding user response:

It's not enough to just split on commas; you also need to remove the literal quotation marks from the content of the string.

Quotation marks are not numbers. Thus, a string that contains quotation marks as part of the data within it is not a string that contains only a number. Just as the Python string 'a123a' cannot be parsed as a number, neither can '"123"': The "s are just as out-of-place in the second example as the as are in the first.

For example, you could use:

float(a[0].replace('"', ''))

Insofar as your JSON document is encapsulating CSV data, you can use the Python csv module to parse it in a way that will remove those quotes:

data_in = {'abc': '"123.456","zzz"'}
a = csv.reader([data_in['abc']]).__next__()
print("this now works:", float(a[0]))
  • Related