Context
I am using ruamel.yaml
(0.17.21) to automatically inject/update nested objects to a collection of existing YAML documents.
All these documents have a maximum line length of 120 characters, enforced by a linter.
I was expecting to be able to retain this formatting rule by setting the width
attribute on the YAML
instance. However, in practice, unbreakable words such as URLs end up overflowing the 120 characters limit while being dumped back to the output stream.
For example, the following code reformats the input as shown in the diff below, although I didn't perform any modification to it:
from ruamel.yaml import YAML
import sys
yaml = YAML()
yaml.width = 120
input = yaml.load('''\
arn:
description: ARN of the Log Group to source data from. The expected format is documented at
https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
''')
yaml.dump(input, sys.stdout)
arn:
- description: ARN of the Log Group to source data from. The expected format is documented at
- https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
description: ARN of the Log Group to source data from. The expected format is documented at https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
Question
Is there a way I could influence the result of dump()
without implementing my own Emitter
, such as manually verifying that a generated line doesn't overflow the desired maximum line length, and wrap it myself if that's the case?
CodePudding user response:
There is something strange going on with the emitter for the plain scalars, and that is old (inherited) code, so it might take some time to fix (without breaking other things.
I think you can programmatically correct these with the following WrapToLong
class
passed to the transform argument. I use a class here so you don't need
to use some global variable for getting the width to the routine doing the actual work:
from ruamel.yaml import YAML
import sys
yaml = YAML()
yaml.width = 120
input = yaml.load('''\
arn:
description: ARN of the Log Group to source data from. The expected format is documented at
https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
''')
class WrapToLong:
def __init__(self, width, indent=2):
self._width = width
self._indent = indent
def __call__(self, s):
res = []
for line in s.splitlines():
if len(line) > self._width and ' ' in line:
idx = 0
while line[idx] == ' ':
idx = 1
line, rest = line.rsplit(' ', 1)
res.append(line)
res.append(' ' * (idx self._indent) rest)
continue
res.append(line)
return '\n'.join(res) '\n'
yaml.dump(input, sys.stdout, transform=WrapToLong(yaml.width))
which gives:
arn:
description: ARN of the Log Group to source data from. The expected format is documented at
https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
You could use folded scalars (using >-
) those keep the newlines where they were in ruamel.yaml
, but you would need to update
all your YAML files (programmatically) and you could not easily update the loaded string if the text before the URL changes, because that can change the positions where the string was folded.