Home > Software engineering >  Is there a way to apply transformations on dump() with ruamel.yaml?
Is there a way to apply transformations on dump() with ruamel.yaml?

Time:05-10

Context

I am using ruamel.yaml (0.17.21) to automatically inject/update nested objects to a collection of existing YAML documents.

All these documents have a maximum line length of 120 characters, enforced by a linter. I was expecting to be able to retain this formatting rule by setting the width attribute on the YAML instance. However, in practice, unbreakable words such as URLs end up overflowing the 120 characters limit while being dumped back to the output stream.

For example, the following code reformats the input as shown in the diff below, although I didn't perform any modification to it:

from ruamel.yaml import YAML
import sys

yaml = YAML()
yaml.width = 120

input = yaml.load('''\
arn:
  description: ARN of the Log Group to source data from. The expected format is documented at
     https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
''')

yaml.dump(input, sys.stdout)
 arn:
-  description: ARN of the Log Group to source data from. The expected format is documented at
-    https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
   description: ARN of the Log Group to source data from. The expected format is documented at https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies

Question

Is there a way I could influence the result of dump() without implementing my own Emitter, such as manually verifying that a generated line doesn't overflow the desired maximum line length, and wrap it myself if that's the case?

CodePudding user response:

There is something strange going on with the emitter for the plain scalars, and that is old (inherited) code, so it might take some time to fix (without breaking other things.

I think you can programmatically correct these with the following WrapToLong class passed to the transform argument. I use a class here so you don't need to use some global variable for getting the width to the routine doing the actual work:

from ruamel.yaml import YAML
import sys

yaml = YAML()
yaml.width = 120

input = yaml.load('''\
arn:
  description: ARN of the Log Group to source data from. The expected format is documented at
     https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
''')

class WrapToLong:
    def __init__(self, width, indent=2):
        self._width = width
        self._indent = indent

    def __call__(self, s):
        res = []
        for line in s.splitlines():
            if len(line) > self._width and ' ' in line:
                idx = 0
                while line[idx] == ' ':
                    idx  = 1
                line, rest = line.rsplit(' ', 1)
                res.append(line)
                res.append(' ' * (idx   self._indent)   rest)
                continue
            res.append(line)
        return '\n'.join(res)   '\n'

yaml.dump(input, sys.stdout, transform=WrapToLong(yaml.width))

which gives:

arn:
  description: ARN of the Log Group to source data from. The expected format is documented at
    https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies

You could use folded scalars (using >-) those keep the newlines where they were in ruamel.yaml, but you would need to update all your YAML files (programmatically) and you could not easily update the loaded string if the text before the URL changes, because that can change the positions where the string was folded.

  • Related