I have some troubles finding some clear and readable solution to my problem. So I have some messages that all have the same format :
message = "sender -> receiver : message_name [param1 : value1, ..., param_n : value_n] \
Loop : value Par : value Par_id : value"
The message is generated so it always has the same format, the only thing that changes from one message to another are the values of the Par Par_id and Loop (always integers), and the number of parameters (can be 0 : []
or multiple : [Param1 : value1, Param2 : value2, Param3 : value3]
)
Obviously the message name, sender and receiver are not always the same but I guess that doesn't really matter because it doesn't impact that way the string is constructed.
For instance one message could be :
'c -> i : C_I_status [eStatus : 1, eSceEnd : 4] Loop : 0 Par : 0 Par_id : -1'
I would like to format these messages in the following format :
[sender, receiver, message_name, [ [param1, value1], ..., [param_n, value_n] ], loop_value, par_value, par_id_value]
With the example above I would have the output :
['c', 'i', 'C_I_status', [ ['eStatus', '1'], ['eSceEnd', '4'] ], '0', '0', '-1']
So far I have tried to split and strip but I am getting all confused with so many list items and ways of cutting my string in order to have the correct format. Here is what I came with : (it gets messy when it comes to the parameters)
sender = line.split(' ', 6)[0]
receiver = line.split(' ', 6)[2]
message_name = line.split(' ', 6)[4]
params = [i.split(':') for i in line.split(' ', 6)[5].strip('[]').split(',')]
looped = line.split(' ', 6)[6].split(' ')[2]
pared = line.split(' ', 6)[6].split(' ')[5]
pared_id = line.split(' ', 6)[6].split(' ')[8]
message_stack.append([sender, receiver, message_name, params, looped, pared])
Thank you all :)
CodePudding user response:
You can definitely use regex, but it is probably good to know if that is always the format (is the message code generated? are the spaces always the same? are Loop
, Par
, and Par_id
always there? are the numbers always integers?). If so, the regex can be quite simple (not sure about the readability):
import re
s = 'c -> i : C_I_status [eStatus : 1, eSceEnd : 4] Loop : 0 Par : 0 Par_id : -1'
# Extract all the required information to a list
match = re.findall('(\w ) -> (\w ) : (\w ) (\[.*\]) \w : (-*\d ) \w : (-*\d ) \w : (-*\d )', s)
if match: # If the message is well formated, it should have 1 element inside
parts = list(match[0])
# Deal with the information between `[]`
parts[3] = [x.split(' : ') for x in re.findall('\w : -*\d ', parts[3])]
# ['c', 'i', 'C_I_status', [['eStatus', '1'], ['eSceEnd', '4']], '0', '0', '-1']
Some of the key things you need to know:
\w
matches any alpha characters (the plus means 1 or more times)-*\d
matches any digits (the-*
is an optional negative sign)\[.*\]
is matching anything (.*) between[]
- The
()
are capture groups (i.e. only things between()
are returned)
You can also adapt that regex for variable spaces, decimal numbers, etc. but it gets a bit more complex.
CodePudding user response:
Here you have a solution that combines both regular expressions and split()
string functions to achieve the desired output:
import re
s = 'c -> i : C_I_status [eStatus : 1] Loop : 0 Par : 0 Par_id : -1'
s = s.split(":")
s = s[0].split("->") s[1:]
s[4:6] = [re.sub("[^0-9]", "", i) for i in s[4:6]]
s[2:4] = [s[2].split("[")[0], [[s[2].split("[")[1], s[3].split("]")[0]]]]
print(s)
Output:
['c ', ' i ', ' C_I_status ', [['eStatus ', ' 1']], '0', '0', ' -1']