Home > Blockchain >  Parsing YAML comments with ruamel.yaml
Parsing YAML comments with ruamel.yaml

Time:11-13

I'm trying to parse the comments of a yaml file using ruamel.yaml. The thing is, that I want to parse the comments in some sort of specific (I consider it logic, though) way. I have the following yaml file:

---
# comment for the foo variable
foo: 'foo_val'

# comment for the bar variable
bar: ['item1', 'item2']

And what I have been trying to do is the following:

from ruamel.yaml import YAML
yaml = YAML()
yaml.preserve_quotes = True
yaml.explicit_start = True

stream = open('my_file.yml', 'r')
loaded = yaml.load(stream) # by default is roundtrip

loaded.ca 
print(loaded)

what prints the following:

Comment(
start=[None, [CommentToken('# comment for the foo variable\n', line: 1, col: 0)]],
items={
foo: [None, None, CommentToken('\n\n# comment for the bar variable\n', line: 3, col: 0), None]})

ordereddict([('foo', 'foo_val'), ('bar', ['item1', 'item2'])])

As you can see, the commented map (I believe it's called so) does not keep an order. I have tried the other way, commenting below the variable definition but with the Python-style list it doesn't work either.

Does anyone know if it's possible to get the commented map without considering the first comment as that start object? Basically, my expected output will be each variable with the comment just on top, i.e.:

Comment(
    items={
    foo: [None, None, CommentToken('\n\n# comment for the foo variable\n', line: 0, col: 0), None],
    bar: [None, None, CommentToken('\n\n# comment for the bar variable\n', line: 3, col: 0), None],
  })

CodePudding user response:

The YAML specification requires comments to be discarded. Therefore, they are not specified in a way that gives you tight control over the node they are associated with.

ruamel does its best to map comments to where it thinks they belong to, so that the initial input can be reproduced, but it cannot generally give you the control you ask while preserving the YAML syntax.

A possible solution would be to use a flow mapping, which has an explicit start and thus the comment is decidedly inside the mapping:

{
  # comment for the foo variable
  foo: 'foo_val',

  # comment for the bar variable
  bar: ['item1', 'item2']
}

Note the comma after 'foo_val' which is required by flow syntax. Indentation is optional.

CodePudding user response:

Comments in ruamel.yaml are in principle associated with to the node of the key/index of the collection that is being processed, so almost never with something the parser hasn't seen yet (like you want). There are some exceptions to this (comments before any data in a file, comments between key and value), but that is the general rule. Empty lines and full line comments are treated as continuations of end-of-line comments (even if they don't exist for a certain key/index).

So although your interpretation is valid, there is currently no easy way to get to the comments like you want. Work is in progress to improve that (giving control over how the "follow-up" full line comments and emtpy lines are associated), but there is no estimate on when that will be included.

  • Related