I'm trying to parse the output of the zfs command zpool status
, which gives me an output like so:
pool: tank
state: ONLINE
scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdc ONLINE 0 0 0
sdb ONLINE 0 0 0
sdd ONLINE 0 0 0
sdf ONLINE 0 0 0
errors: No known data errors
My goal is to convert this output to a dictionary, like so
{
'pool': 'tank',
'state': 'ONLINE',
'scan': 'resilvered 35.5G in...',
'config': 'NAME STATE READ WRITE CKSUM...',
'errors': 'No known data errors'
}
I'm experiencing two problems that are causing me to write messy code:
- Not every line, such as the
scan
line, is displayed every time the command is run, and additional lines are possible that are not displayed above - The
config
line has a few newlines before its output, which makes splitting difficult
I've tried a few different ways of doing this, but my code gets bogged-down with a bunch of conditionals - and being python I figured there must be a cleaner way.
This is the "cleanest" method I've found, but it's not super-readable and it doesn't work with the config
line:
# output = `zpool status` output
d = {}
for entry in map(lambda x: x.strip(), output.split('\n')):
if 'state' in entry:
pool_state = entry.split(' ')
key = pool_state[0]
val = pool_state[1]
d[key] = val
if 'status' in entry:
...
if 'config' in entry:
# entry does not contain output of the config: line
CodePudding user response:
Here is an example.
s = """ pool: tank
state: ONLINE
scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdc ONLINE 0 0 0
sdb ONLINE 0 0 0
sdd ONLINE 0 0 0
sdf ONLINE 0 0 0
errors: No known data errors"""
res = {}
for line in s.splitlines():
if line == "": # Ignore everything after the last x: v
break
k, v = line.lstrip(" ").split(":", 1)
if v:
res[k] = v.lstrip(" ")
Result:
{'pool': 'tank', 'state': 'ONLINE', 'scan': 'resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022'}
CodePudding user response:
I recommend using re.spit and splitting on keys (state, scan) that are at the beginning of the line separated by :
and then converting them to dictionary using zip.
You can also parse config to list of dictionaries.
import re
from pprint import pprint
s = """ pool: tank
state: ONLINE
scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdc ONLINE 0 0 0
sdb ONLINE 0 0 0
sdd ONLINE 0 0 0
sdf ONLINE 0 0 0
errors: No known data errors"""
def parse_data(data):
parts = re.split(r'(?:\n|^)\s*(\w*):\s*', data.strip(), re.MULTILINE)[1:]
parsed = dict(zip(parts[::2], parts[1::2]))
return {
**parsed,
'config': parse_config(parsed.get('config', ''))
}
def parse_config(data):
lines = [v.strip().split() for v in data.splitlines() if v.strip()]
if lines:
return [
dict(zip(lines[0], v))
for v in lines[1:]
]
return []
pprint(parse_data(s))
Output should be:
{'config': [{'CKSUM': '0',
'NAME': 'tank',
'READ': '0',
'STATE': 'ONLINE',
'WRITE': '0'},
{'CKSUM': '0',
'NAME': 'raidz2-0',
'READ': '0',
'STATE': 'ONLINE',
'WRITE': '0'},
{'CKSUM': '0',
'NAME': 'sda',
'READ': '0',
'STATE': 'ONLINE',
'WRITE': '0'},
{'CKSUM': '0',
'NAME': 'sdc',
'READ': '0',
'STATE': 'ONLINE',
'WRITE': '0'},
{'CKSUM': '0',
'NAME': 'sdb',
'READ': '0',
'STATE': 'ONLINE',
'WRITE': '0'},
{'CKSUM': '0',
'NAME': 'sdd',
'READ': '0',
'STATE': 'ONLINE',
'WRITE': '0'},
{'CKSUM': '0',
'NAME': 'sdf',
'READ': '0',
'STATE': 'ONLINE',
'WRITE': '0'}],
'errors': 'No known data errors',
'pool': 'tank',
'scan': 'resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 '
'2022',
'state': 'ONLINE'}