I have the following input and output that I wish to achieve using regex. I would be happy for your assistance.
input:
'00:00:00:0000 Rx 2 0x064 s 8 20 20 20 20 20 20 20 20'
desired output:
['00:00:00:0000', 'Rx', '2', '0x064', 's', '8', '20 20 20 20 20 20 20 20']
i.e., I want every word to be in a token, except for the eight last strings to be in their own token together.
CodePudding user response:
I would use an re.findall
approach here:
inp = '00:00:00:0000 Rx 2 0x064 s 8 20 20 20 20 20 20 20 20'
parts = re.findall(r'(\d{2}:\d{2}:\d{2}:\d ) (\w ) (\d ) (\d x\d ) (\w ) (\d ) (\d (?: \d )*)', inp)
print(parts)
This prints:
[('00:00:00:0000', 'Rx', '2', '0x064', 's', '8', '20 20 20 20 20 20 20 20')]
CodePudding user response:
I don't know how general the solution should be - in any case, given what you described
except for the eight last strings to be in their own token together
To me this requirement does not need a regex solution, given how the problem is posed.
You could achieve what you want using this:
s = "00:00:00:0000 Rx 2 0x064 s 8 20 20 20 20 20 20 20 20"
s.split(" ", s.count(" ")-7)
You could use the re
package to make your splitting more flexible, for example when you have multiple spaces between the tokens:
import re
s = "00:00:00:0000 Rx 2 0x064 s 8 20 20 20 20 20 20 20 20"
re.split("[ ] ", s, len(re.findall("[ ] ", s))-7)