Home > Blockchain >  How to parse a multiline string which looks like a table into a list of dictionaries?
How to parse a multiline string which looks like a table into a list of dictionaries?

Time:12-28

I have a multiline string sample below, which has a table-like structure. I have to parse that structure and convert back to key-value pairs such that the key is the column header and the value is the value of that row. I have used a regex but it's not working properly/

PFB string:

Number of Critical alarms:  0
Number of Major alarms:     0
Number of Minor alarms:     0

 Slot        Sensor          Current State   Reading        Threshold(Minor,Major,Critical,Shutdown)
 ----------  --------------  --------------- ------------   ---------------------------------------
 P0          PEM Iout        Normal          5    A         na
 P0          PEM Vout        Normal          12   V DC      na
 P0          PEM Vin         Normal          242  V AC      na
 P0          Temp: PEM In    Normal          34   Celsius   (80 ,90 ,95 ,100)(Celsius)
 P0          Temp: PEM Out   Normal          30   Celsius   (80 ,90 ,95 ,100)(Celsius)
 R0          Temp: FC FANS   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)
 P0          Temp: FC FAN0   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)
 P1          Temp: FC FAN1   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)

Expected Output:

[{'Slot': 'P0', 'Sensor': 'PEM Iout', 'Current State': 'Normal', 'Reading': '5 A', 'Threshold': 'na'}, ...]

I have used the below regex pattern:

r'^(?P<Slot>[^\s] )[ \t] (?P<Sensor>[a-zA-Z0-9:]  [a-z0-9A-Z.:-]* [a-z0-9]*)[ \t] (?P<State>[a-zA-Z]*)[ \t] '

CodePudding user response:

If the columns always have the same widths:

pfb="""Number of Critical alarms:  0
Number of Major alarms:     0
Number of Minor alarms:     0

 Slot        Sensor          Current State   Reading        Threshold(Minor,Major,Critical,Shutdown)
 ----------  --------------  --------------- ------------   ---------------------------------------
 P0          PEM Iout        Normal          5    A         na
 P0          PEM Vout        Normal          12   V DC      na
 P0          PEM Vin         Normal          242  V AC      na
 P0          Temp: PEM In    Normal          34   Celsius   (80 ,90 ,95 ,100)(Celsius)
 P0          Temp: PEM Out   Normal          30   Celsius   (80 ,90 ,95 ,100)(Celsius)
 R0          Temp: FC FANS   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)
 P0          Temp: FC FAN0   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)
 P1          Temp: FC FAN1   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)"""

for line in pfb.splitlines()[6:]:
  slot      = line[ 0:13].strip()
  sensor    = line[13:29].strip()
  current   = line[29:42].strip()
  reading   = line[42:60].strip()
  threshold = line[60:  ].strip()

  # Use the parts and process some fields further
  ...

CodePudding user response:

It's hard to find a usable pattern here except the line of dashes (---). What I would do is a bit of a manual work:

import re

s = """Number of Critical alarms:  0
Number of Major alarms:     0
Number of Minor alarms:     0

 Slot        Sensor          Current State   Reading        Threshold(Minor,Major,Critical,Shutdown)
 ----------  --------------  --------------- ------------   ---------------------------------------
 P0          PEM Iout        Normal          5    A         na
 P0          PEM Vout        Normal          12   V DC      na
 P0          PEM Vin         Normal          242  V AC      na
 P0          Temp: PEM In    Normal          34   Celsius   (80 ,90 ,95 ,100)(Celsius)
 P0          Temp: PEM Out   Normal          30   Celsius   (80 ,90 ,95 ,100)(Celsius)
 R0          Temp: FC FANS   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)
 P0          Temp: FC FAN0   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)
 P1          Temp: FC FAN1   Fan Speed 60%   23   Celsius   (25 ,35 ,0  )(Celsius)"""

# "strip" the first lines
lines = s.splitlines()[4:]

# extract the indexes of the columns according to the dashes line.
# add 0 and None to cover the edges
indexes = [0]   [m.start() for m in re.finditer(r'\s ', lines[1].strip())]   [None]
# zip the indexes into couples of start-finish
start_finish_indexes = list(zip(indexes, indexes[1:]))
# extract the headers according to the indexes
headers = [lines[0][start:finish].strip() for start, finish in start_finish_indexes]

res = []
for line in lines[2:]:
    # same as with the headers
    columns = [line[start:finish].strip() for start, finish in start_finish_indexes]
    # add a dict with keys as headers and the values are the values of the row
    res.append(dict(zip(headers, columns)))

print(res)

Gives:

[{'Slot': 'P0', 'Sensor': 'PEM Iout', 'Current State': 'Normal', 'Reading': '5    A', 'Threshold(Minor,Major,Critical,Shutdown)': 'na'}, ...]
  • Related