Home > Blockchain >  Converting Python dictionary to YAML file with lists as multiline strings
Converting Python dictionary to YAML file with lists as multiline strings

Time:01-28

I'm trying to convert a Python dictionary of the following form:

{
    "version": "3.1",
    "nlu": [
        {
            "intent": "greet",
            "examples": ["hi", "hello", "howdy"]
        },
        {
            "intent": "goodbye",
            "examples": ["goodbye", "bye", "see you later"]
        }
    ]
 }

to a YAML file of the following form (note the pipes preceding the value associated to each examples key):

version: "3.1"
nlu:
- intent: greet
  examples: |
    - hi
    - hello
    - howdy
- intent: goodbye
  examples: |
    - goodbye
    - bye
    - see you later

Except for needing the pipes (because of Rasa's training data format specs), I'm familiar with how to accomplish this task using yaml.dump().

What's the most straightforward way to obtain the format I'm after?

EDIT: Converting the value of each examples key to a string first yields a YAML file which is not at all reader-friendly, especially given that I have many intents comprising many hundreds of total example utterances.

version: '3.1'
nlu:
- intent: greet
  examples: "  - hi\n  - hello\n  - howdy\n" 
- intent: goodbye
  examples: "  - goodbye\n  - bye\n  - see you later\n"  

I understand that this multi-line format is what the pipe symbol accomplishes, but I'd like to convert it to something more palatable. Is that possible?

CodePudding user response:

You are asking for the examples value to be represented in your YAML output as a multiline string using the block quote operator (|).

In your Python data, examples is a list of strings, not a multiline string:

{
    "intent": "greet",
    "examples": ["hi", "hello", "howdy"]
},

Of course a Python list will be represented as a YAML list.

If you want it rendered as a block literal value, you need to transform the Python value into a multi-line string ("examples": "- hi\n- hello\n -howdy"), and then you need to configure the yaml module to output strings using the block quote operator.

Something like this:

import yaml

data = {
    "version": "3.1",
    "nlu": [
        {
            "intent": "greet",
            "examples": ["hi", "hello", "howdy"]
        },
        {
            "intent": "goodbye",
            "examples": ["goodbye", "bye", "see you later"]
        }
    ]
 }

def quoted_presenter(dumper, data):
    if '\n' in data:
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    else:
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')

yaml.add_representer(str, quoted_presenter)


for item in data['nlu']:
    item['examples'] = yaml.safe_dump(item['examples'])

print(yaml.dump(data))

This will output:

"nlu":
- "examples": |-
    - hi
    - hello
    - howdy
  "intent": "greet"
- "examples": |-
    - goodbye
    - bye
    - see you later
  "intent": "goodbye"
"version": "3.1"

Yes, this quotes everything (keys as well as values), but that's about the limits of our granularity using the yaml module. Without the custom representer, we would get instead:

nlu:
- examples: '- hi

    - hello

    - howdy'
  intent: greet
- examples: '- goodbye

    - bye

    - see you later'
  intent: goodbye
version: '3.1'

That's syntactically identical; just with different formatting.

It's possible that ruamel.yaml provides more control over the output format.

  • Related