I'm trying to split a string into an initial numeric component and the remaining components, using the first occurrence of a letter as the delimiter, so for example:
"123b" -> ["123", "b"]
"12b7.97ap" -> ["12", "b7.97ap"]
I'm new to regex and I'm having trouble achieving this... The best I could do is:
re.split(r"(\d )", string)
But this returns:
["", "123", "b"]
["", "12", "b", "7", ".", "97", "ap"]
For the two examples above. I suppose I could then combine all the elements after index 1 into a single string, but I'm sure there's a better way... Thanks in advance for any help!
CodePudding user response:
One possibility, first matching digits and then matching the rest:
import re
for s in "123b", "12b7.97ap":
print(re.findall(r'\d |. ', s))
Output:
['123', 'b']
['12', 'b7.97ap']
CodePudding user response:
You can write:
str = '12b7.97ap'
re.split(r'(?<=\d)(?!\d)', str, 1)
#=> ["12", "b7.97ap"]
split
's optional third argument (here 1
) is the maximum number of splits to perform.
The regular expression matches the first (zero-width) location (think between successive characters) that follows a digit (\d
) and does not precede a digit. (?<=\d)
is a positive lookbehind; (?!\d)
is a negative lookahead.
This solution does not require the string to begin with a digit. For example:
str = 'Prefix 12b7.97ap'
re.split(r'(?<=\d)(?=\D)', str, 1)
#=> ["Prefix 12", "b7.97ap"]
CodePudding user response:
I would recommend that you read the python docs for groups in regex. You can form groups in matches by having sup regex you can than reference these matches. A minimal fully working example might look like that:
import sys
import re
if __name__ == '__main__':
for line in sys.stdin:
line = line.rstrip()
if line in ['exit', 'q']:
exit()
else:
match = re.search(r'(\d )(. )', line)
print(f"group1: {match.group(1)} | group2: {match.group(2)}")
But while considering "good pythonic" solutions or getting a better performance might become handy in the long run your should not worry to much in the beginning.