Spliting a string with regex-CodePudding

I have the following input and output that I wish to achieve using regex. I would be happy for your assistance.

input:

'00:00:00:0000 Rx 2 0x064 s 8 20 20 20 20 20 20 20 20'

desired output:

['00:00:00:0000', 'Rx', '2', '0x064', 's', '8', '20 20 20 20 20 20 20 20']

i.e., I want every word to be in a token, except for the eight last strings to be in their own token together.

CodePudding user response：

I would use an re.findall approach here:

inp = '00:00:00:0000 Rx 2 0x064 s 8 20 20 20 20 20 20 20 20'
parts = re.findall(r'(\d{2}:\d{2}:\d{2}:\d ) (\w ) (\d ) (\d x\d ) (\w ) (\d ) (\d (?: \d )*)', inp)
print(parts)

This prints:

[('00:00:00:0000', 'Rx', '2', '0x064', 's', '8', '20 20 20 20 20 20 20 20')]

CodePudding user response：

I don't know how general the solution should be - in any case, given what you described

except for the eight last strings to be in their own token together

To me this requirement does not need a regex solution, given how the problem is posed.

You could achieve what you want using this:

s = "00:00:00:0000 Rx 2 0x064 s 8 20 20 20 20 20 20 20 20"
s.split(" ", s.count(" ")-7)

You could use the re package to make your splitting more flexible, for example when you have multiple spaces between the tokens:

import re
s = "00:00:00:0000  Rx 2  0x064    s 8 20 20 20 20   20       20 20 20"
re.split("[ ] ", s, len(re.findall("[ ] ", s))-7)