Home > Software design >  dumping YAML with tags as JSON
dumping YAML with tags as JSON

Time:05-22

I know I can use ruamel.yaml to load a file with tags in it. But when I want to dump without them i get an error. Simplified example :-

from ruamel.yaml import YAML
from json import dumps
import sys

yaml = YAML()
data = yaml.load(
"""
!mytag
a: 1
b: 2
c: 2022-05-01
"""
)

try:
    yaml2 = YAML(typ='safe', pure=True)
    yaml.default_flow_style = True
    yaml2.dump(data, sys.stdout)
except Exception as e:
    print('exception dumping using yaml', e)
try:
    print(dumps(data))
except Exception as e:
    print('exception dumping using json', e)

exception dumping using cannot represent an object: ordereddict([('a', 1), ('b', 2), ('c', datetime.date(2022, 5, 1))])

exception dumping using json Object of type date is not JSON serializable

I cannot change the load() without getting an error on the tag. How to get output with tags stripped (YAML or JSON)?

CodePudding user response:

You get the error because the neither the safe dumper (pure or not), nor JSON, do know about the ruamel.yaml internal types that preserve comments, tagging, block/flow-style, etc.

Dumping as YAML, you could register these types with alternate dump methods. As JSON this is more complex as AFAIK you can only convert the leaf-nodes (i.e. the YAML scalars, you would e.g. be able to use that to dump the datetime.datetime instance that is loaded as the value of key c).

I have used YAML as a readable, editable and programmatically updatable config file with an much faster loading JSON version of the data used if its file is not older than the corresponding YAML (if it is older JSON gets created from the YAML). The thing to do in order to dump(s) is recursively generate Python primitives that JSON understands.

The following does so, but there are other constructs besides datetime instances that JSON doesn't allow. E.g. when using sequences or dicts as keys (which is allowed in YAML, but not in JSON). For keys that are sequences I concatenate the string representation of the elements :

from ruamel.yaml import YAML
import sys
import datetime
import json
from collections.abc import Mapping

yaml = YAML()
data = yaml.load("""\
!mytag
a: 1
b: 2
c: 2022-05-01
[d, e]: !myseq [42, 196]
{f: g, 18: y}: !myscalar x
""")

def json_dump(data, out, indent=None):
    def scalar(obj):
        if obj is None:
            return None
        if isinstance(obj, (datetime.date, datetime.datetime)):
            return str(obj)
        if isinstance(obj, ruamel.yaml.scalarbool.ScalarBoolean):
            return obj == 1
        if isinstance(obj, bool):
            return bool(obj)
        if isinstance(obj, int):
            return int(obj)
        if isinstance(obj, float):
            return float(obj)
        if isinstance(obj, tuple):
            return '_'.join([str(x) for x in obj])
        if isinstance(obj, Mapping):
            return '_'.join([f'{k}-{v}' for k, v in obj.items()])
        if not isinstance(obj, str): print('type', type(obj))
        return obj

    def prep(obj):
        if isinstance(obj, dict):
            return {scalar(k): prep(v) for k, v in obj.items()}
        if isinstance(obj, list):
            return [prep(elem) for elem in obj]
        if isinstance(obj, ruamel.yaml.comments.TaggedScalar):
            return prep(obj.value)
        return scalar(obj)

    res = prep(data)
    json.dump(res, out, indent=indent)


json_dump(data, sys.stdout, indent=2)

which gives:

{
  "a": 1,
  "b": 2,
  "c": "2022-05-01",
  "d_e": [
    42,
    196
  ],
  "f-g_18-y": "x"
}
  • Related