I have a multiline string sample below, which has a table-like structure. I have to parse that structure and convert back to key-value pairs such that the key is the column header and the value is the value of that row. I have used a regex but it's not working properly/
PFB string:
Number of Critical alarms: 0
Number of Major alarms: 0
Number of Minor alarms: 0
Slot Sensor Current State Reading Threshold(Minor,Major,Critical,Shutdown)
---------- -------------- --------------- ------------ ---------------------------------------
P0 PEM Iout Normal 5 A na
P0 PEM Vout Normal 12 V DC na
P0 PEM Vin Normal 242 V AC na
P0 Temp: PEM In Normal 34 Celsius (80 ,90 ,95 ,100)(Celsius)
P0 Temp: PEM Out Normal 30 Celsius (80 ,90 ,95 ,100)(Celsius)
R0 Temp: FC FANS Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)
P0 Temp: FC FAN0 Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)
P1 Temp: FC FAN1 Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)
Expected Output:
[{'Slot': 'P0', 'Sensor': 'PEM Iout', 'Current State': 'Normal', 'Reading': '5 A', 'Threshold': 'na'}, ...]
I have used the below regex pattern:
r'^(?P<Slot>[^\s] )[ \t] (?P<Sensor>[a-zA-Z0-9:] [a-z0-9A-Z.:-]* [a-z0-9]*)[ \t] (?P<State>[a-zA-Z]*)[ \t] '
CodePudding user response:
If the columns always have the same widths:
pfb="""Number of Critical alarms: 0
Number of Major alarms: 0
Number of Minor alarms: 0
Slot Sensor Current State Reading Threshold(Minor,Major,Critical,Shutdown)
---------- -------------- --------------- ------------ ---------------------------------------
P0 PEM Iout Normal 5 A na
P0 PEM Vout Normal 12 V DC na
P0 PEM Vin Normal 242 V AC na
P0 Temp: PEM In Normal 34 Celsius (80 ,90 ,95 ,100)(Celsius)
P0 Temp: PEM Out Normal 30 Celsius (80 ,90 ,95 ,100)(Celsius)
R0 Temp: FC FANS Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)
P0 Temp: FC FAN0 Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)
P1 Temp: FC FAN1 Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)"""
for line in pfb.splitlines()[6:]:
slot = line[ 0:13].strip()
sensor = line[13:29].strip()
current = line[29:42].strip()
reading = line[42:60].strip()
threshold = line[60: ].strip()
# Use the parts and process some fields further
...
CodePudding user response:
It's hard to find a usable pattern here except the line of dashes (---
). What I would do is a bit of a manual work:
- "Strip" the first rows before the table.
- Check the size (enclosing indexes) of each column according to the dashes line.
- Slice the rows according to the extracted indexes.
strip
from whitespaces.- Save to a
dict
byzip
ping the headers with the current line.
import re
s = """Number of Critical alarms: 0
Number of Major alarms: 0
Number of Minor alarms: 0
Slot Sensor Current State Reading Threshold(Minor,Major,Critical,Shutdown)
---------- -------------- --------------- ------------ ---------------------------------------
P0 PEM Iout Normal 5 A na
P0 PEM Vout Normal 12 V DC na
P0 PEM Vin Normal 242 V AC na
P0 Temp: PEM In Normal 34 Celsius (80 ,90 ,95 ,100)(Celsius)
P0 Temp: PEM Out Normal 30 Celsius (80 ,90 ,95 ,100)(Celsius)
R0 Temp: FC FANS Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)
P0 Temp: FC FAN0 Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)
P1 Temp: FC FAN1 Fan Speed 60% 23 Celsius (25 ,35 ,0 )(Celsius)"""
# "strip" the first lines
lines = s.splitlines()[4:]
# extract the indexes of the columns according to the dashes line.
# add 0 and None to cover the edges
indexes = [0] [m.start() for m in re.finditer(r'\s ', lines[1].strip())] [None]
# zip the indexes into couples of start-finish
start_finish_indexes = list(zip(indexes, indexes[1:]))
# extract the headers according to the indexes
headers = [lines[0][start:finish].strip() for start, finish in start_finish_indexes]
res = []
for line in lines[2:]:
# same as with the headers
columns = [line[start:finish].strip() for start, finish in start_finish_indexes]
# add a dict with keys as headers and the values are the values of the row
res.append(dict(zip(headers, columns)))
print(res)
Gives:
[{'Slot': 'P0', 'Sensor': 'PEM Iout', 'Current State': 'Normal', 'Reading': '5 A', 'Threshold(Minor,Major,Critical,Shutdown)': 'na'}, ...]