Home > Mobile >  Pythonic way to parse the output of `zpool status` into a dictionary - inconsistent output causing m
Pythonic way to parse the output of `zpool status` into a dictionary - inconsistent output causing m

Time:09-11

I'm trying to parse the output of the zfs command zpool status, which gives me an output like so:

  pool: tank
 state: ONLINE
  scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors

My goal is to convert this output to a dictionary, like so

{
    'pool': 'tank',
    'state': 'ONLINE',
    'scan': 'resilvered 35.5G in...',
    'config': 'NAME        STATE     READ WRITE CKSUM...',
    'errors': 'No known data errors'
}

I'm experiencing two problems that are causing me to write messy code:

  1. Not every line, such as the scan line, is displayed every time the command is run, and additional lines are possible that are not displayed above
  2. The config line has a few newlines before its output, which makes splitting difficult

I've tried a few different ways of doing this, but my code gets bogged-down with a bunch of conditionals - and being python I figured there must be a cleaner way.

This is the "cleanest" method I've found, but it's not super-readable and it doesn't work with the config line:

# output = `zpool status` output
d = {}

for entry in map(lambda x: x.strip(), output.split('\n')):
    if 'state' in entry:
        pool_state = entry.split(' ')
        key = pool_state[0]
        val = pool_state[1]
        d[key] = val
    if 'status' in entry:
        ...
    if 'config' in entry:
        # entry does not contain output of the config: line

CodePudding user response:

Here is an example.

s = """  pool: tank
 state: ONLINE
  scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors"""

res = {}
for line in s.splitlines():
    if line == "":  # Ignore everything after the last x: v
        break
    k, v = line.lstrip(" ").split(":", 1)
    if v:
        res[k] = v.lstrip(" ")

Result:

{'pool': 'tank', 'state': 'ONLINE', 'scan': 'resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022'}

CodePudding user response:

I recommend using re.spit and splitting on keys (state, scan) that are at the beginning of the line separated by : and then converting them to dictionary using zip.

You can also parse config to list of dictionaries.

import re
from pprint import pprint

s = """  pool: tank
 state: ONLINE
  scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors"""

def parse_data(data):
    parts = re.split(r'(?:\n|^)\s*(\w*):\s*', data.strip(), re.MULTILINE)[1:]
    parsed = dict(zip(parts[::2], parts[1::2]))
    return {
        **parsed,
        'config': parse_config(parsed.get('config', ''))
    }


def parse_config(data):
    lines = [v.strip().split() for v in data.splitlines() if v.strip()]
    if lines:
        return [
            dict(zip(lines[0], v))
            for v in lines[1:]
        ]
    return []
    

pprint(parse_data(s))

Output should be:

{'config': [{'CKSUM': '0',
             'NAME': 'tank',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'raidz2-0',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sda',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sdc',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sdb',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sdd',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sdf',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'}],
 'errors': 'No known data errors',
 'pool': 'tank',
 'scan': 'resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 '
         '2022',
 'state': 'ONLINE'}
  • Related