so I have been trying to parse some serial port data that is continuously coming in. I intend to eventually implement my program into my serial port program so I can parse the data there. Right now I have been working on trying to parse a sample string from some text before trying to do it as data continuously arriving. The 'text' string below is an example. I need to recognise that X is the start of a substring, and a space/EOL is the end of string. I then need to split the X and Y parts of the substrings, and convert the value in X to an integer. I then need to split the characters in the Y part of a substring into pairs.
My main issue right now is that I need to do this stepwise. So need to recognise the start(X=) and end(\n) markers of a substring, parse it to give a value for X, and a list of pairs for Y. Then when the first substring has been dealt with, I want the program to move onto the next, and stop when there are no remaining substrings of relevance.
The solution or an approach to this may be easily found but I am new to python/coding so its quite a challenge for me. Any advice on what I can learn to aid me doing this is welcome.
import re
#text is the data I want to parse
text = '00000:\nHHBBUUSSXXNJJDHCBSOXMJ X=-1323 Y=AA6D87CB78F8EE\nX=-908 Y=C87F32E6767\nX=-87 Y=AB67C78E23\n'
a = []
n = 2 #need to break up the Y values into pairs
S = re.findall(r'X=(.*)\n',text) #now a list with X & Y strings
for i in S:
a =i.split(' Y=') #Split X & Y string into 2 parts
ans = [a[i:i n] for i in range(0, len(a),n)]
def Extract(ans):
x = [int(item[0]) for item in ans] #convert X to integer
y = [re.findall('.{1,2}',item[1]) for item in ans] #split Y data
#into pairs
return x, y
print(Extract(ans))
>>>> Ex of what I want output to look like <<<<
Substring 1:
[X = -1323]
[Y = 'AA', '6D', '87', 'CB', '78', 'F8', 'EE'
(substring 1 is complete)
Substring 2:
[X = -908]
[Y = 'C8', '7F', '32', 'E6', '76', '7']
CodePudding user response:
You can try extracting them directly using regex. Here is my attempt.
import re
text = '00000:\nHHBBUUSSXXNJJDHCBSOXMJ X=-1323 Y=AA6D87CB78F8EE\nX=-908 Y=C87F32E6767\nX=-87 Y=AB67C78E23\n'
regex = r'X=(-?\d ) Y=([A-Z0-9] )'
result = re.findall(regex, text) # [('-1323', 'AA6D87CB78F8EE'), ('-908', 'C87F32E6767'), ('-87', 'AB67C78E23')]
X = [int(i[0]) for i in result] # [-1323, -908, -87]
Y = [i[1] for i in result] # ['AA6D87CB78F8EE', 'C87F32E6767', 'AB67C78E23']
CodePudding user response:
Here's what I would do:
input_string = r'00000:\nHHBBUUSSXXNJJDHCBSOXMJ X=-1323 Y=AA6D87CB78F8EE\nX=-908 Y=C87F32E6767\nX=-87 Y=AB67C78E23\n'
regex = r'(?:.*\s)?X=(?P<X>-?\d )\sY=(?P<Y>.*)'
parts = input_string.split(r'\n')
extracted_vals = []
for part in parts:
m = re.match(regex, part)
if m is None:
# No match, ignore
continue
# Extract X and Y
X,Y = m.groupdict().values()
extracted_vals.append(dict(X=int(X),
Y=[Y[i:i 2] for i in range(0,len(Y),2)]))
Result:
[{'X': -1323, 'Y': ['AA', '6D', '87', 'CB', '78', 'F8', 'EE']},
{'X': -908, 'Y': ['C8', '7F', '32', 'E6', '76', '7']},
{'X': -87, 'Y': ['AB', '67', 'C7', '8E', '23']}]