Home > Enterprise >  Get wrong values by parsing YAML
Get wrong values by parsing YAML

Time:12-15

I'm somewhat confused by yaml parsing results. I made a test.yaml and the results are the same.

val_1: 05334000
val_2: 2345784
val_3: 0537380
str_1: foobar
val_4: 05798
val_5: 051342123

Parsing that with:

import yaml

with open('test.yaml', 'r', encoding='utf8') as f:
    a = yaml.load(f, Loader=yaml.FullLoader)

returns:

{'val_1': 1423360,
 'val_2': 2345784,
 'val_3': '0537380',
 'str_1': 'foobar',
 'val_4': '05798',
 'val_5': 10863699}

Why these values for val_1 and val_5? Is there something special?

In my real data with many yaml files there are values like val_1. For some they parsed correct but for some they don't? All starts with 05, followed by more numbers. Caused by the leading 0 results should be strings. But yaml parses something completely different.

If I read the yaml as textfile f.readlines(), all is fine:

['val_1: 05334000\n',
 'val_2: 2345784\n',
 'val_3: 0537380\n',
 'str_1: foobar\n',
 'val_4: 05798\n',
 'val_5: 051342123\n']

CodePudding user response:

Integers with a leading 0 are parsed as octal; in python you'd need to write them with a leading 0o:

0o5334000 == 1423360

as for '0537380': as there is an 8 present as digit it can not be parsed as an octal number. therefore it remains a string.


if you want to get strings for all your entries you can use the BaseLoader

from io import StringIO
import yaml

file = StringIO("""
val_1: 05334000
val_2: 2345784
val_3: 0537380
str_1: foobar
val_4: 05798
val_5: 051342123
""")

dct = yaml.load(file, Loader=yaml.BaseLoader)

with that i get:

{'val_1': '05334000', 'val_2': '2345784', 'val_3': '0537380', 
 'str_1': 'foobar', 'val_4': '05798', 'val_5': '051342123'}
  • Related