How to parse data stepwise Python?-CodePudding

so I have been trying to parse some serial port data that is continuously coming in. I intend to eventually implement my program into my serial port program so I can parse the data there. Right now I have been working on trying to parse a sample string from some text before trying to do it as data continuously arriving. The 'text' string below is an example. I need to recognise that X is the start of a substring, and a space/EOL is the end of string. I then need to split the X and Y parts of the substrings, and convert the value in X to an integer. I then need to split the characters in the Y part of a substring into pairs.

My main issue right now is that I need to do this stepwise. So need to recognise the start(X=) and end(\n) markers of a substring, parse it to give a value for X, and a list of pairs for Y. Then when the first substring has been dealt with, I want the program to move onto the next, and stop when there are no remaining substrings of relevance.

The solution or an approach to this may be easily found but I am new to python/coding so its quite a challenge for me. Any advice on what I can learn to aid me doing this is welcome.

import re
#text is the data I want to parse
text = '00000:\nHHBBUUSSXXNJJDHCBSOXMJ X=-1323 Y=AA6D87CB78F8EE\nX=-908 Y=C87F32E6767\nX=-87 Y=AB67C78E23\n'

a = []
n = 2 #need to break up the Y values into pairs

S = re.findall(r'X=(.*)\n',text) #now a list with X & Y strings

for i in S:
    a =i.split(' Y=') #Split X & Y string into 2 parts
ans = [a[i:i n] for i in range(0, len(a),n)]

def Extract(ans):
    x = [int(item[0]) for item in ans] #convert X to integer
    y = [re.findall('.{1,2}',item[1]) for item in ans] #split Y data 
                                                        #into pairs
    return x, y

print(Extract(ans))

>>>> Ex of what I want output to look like <<<<

Substring 1:
[X = -1323] 
[Y = 'AA', '6D', '87', 'CB', '78', 'F8', 'EE'
(substring 1 is complete)


Substring 2:
[X = -908] 
[Y = 'C8', '7F', '32', 'E6', '76', '7']

CodePudding user response：

You can try extracting them directly using regex. Here is my attempt.

import re

text = '00000:\nHHBBUUSSXXNJJDHCBSOXMJ X=-1323 Y=AA6D87CB78F8EE\nX=-908 Y=C87F32E6767\nX=-87 Y=AB67C78E23\n'
regex = r'X=(-?\d ) Y=([A-Z0-9] )'

result = re.findall(regex, text)  # [('-1323', 'AA6D87CB78F8EE'), ('-908', 'C87F32E6767'), ('-87', 'AB67C78E23')]

X = [int(i[0]) for i in result]  # [-1323, -908, -87]
Y = [i[1] for i in result]  # ['AA6D87CB78F8EE', 'C87F32E6767', 'AB67C78E23']

CodePudding user response：

Here's what I would do:

input_string = r'00000:\nHHBBUUSSXXNJJDHCBSOXMJ X=-1323 Y=AA6D87CB78F8EE\nX=-908 Y=C87F32E6767\nX=-87 Y=AB67C78E23\n'

regex = r'(?:.*\s)?X=(?P<X>-?\d )\sY=(?P<Y>.*)'

parts = input_string.split(r'\n')

extracted_vals = []

for part in parts:
    m = re.match(regex, part)
    if m is None:
        # No match, ignore
        continue
    # Extract X and Y
    X,Y = m.groupdict().values()
    extracted_vals.append(dict(X=int(X),
                               Y=[Y[i:i 2] for i in range(0,len(Y),2)]))

Result:

[{'X': -1323, 'Y': ['AA', '6D', '87', 'CB', '78', 'F8', 'EE']},
 {'X': -908, 'Y': ['C8', '7F', '32', 'E6', '76', '7']},
 {'X': -87, 'Y': ['AB', '67', 'C7', '8E', '23']}]