Home > Back-end >  Python Regex: Split on first occurrence of letter
Python Regex: Split on first occurrence of letter

Time:02-20

I'm trying to split a string into an initial numeric component and the remaining components, using the first occurrence of a letter as the delimiter, so for example:

"123b" -> ["123", "b"]
"12b7.97ap" -> ["12", "b7.97ap"]

I'm new to regex and I'm having trouble achieving this... The best I could do is:

re.split(r"(\d )", string)

But this returns:

["", "123", "b"]
["", "12", "b", "7", ".", "97", "ap"]

For the two examples above. I suppose I could then combine all the elements after index 1 into a single string, but I'm sure there's a better way... Thanks in advance for any help!

CodePudding user response:

One possibility, first matching digits and then matching the rest:

import re

for s in "123b", "12b7.97ap":
    print(re.findall(r'\d |. ', s))

Output:

['123', 'b']
['12', 'b7.97ap']

CodePudding user response:

You can write:

str = '12b7.97ap'
re.split(r'(?<=\d)(?!\d)', str, 1)
  #=> ["12", "b7.97ap"]

Python demo|Regex demo

split's optional third argument (here 1) is the maximum number of splits to perform.

The regular expression matches the first (zero-width) location (think between successive characters) that follows a digit (\d) and does not precede a digit. (?<=\d) is a positive lookbehind; (?!\d) is a negative lookahead.

This solution does not require the string to begin with a digit. For example:

str = 'Prefix 12b7.97ap'
re.split(r'(?<=\d)(?=\D)', str, 1)
  #=> ["Prefix 12", "b7.97ap"]

CodePudding user response:

I would recommend that you read the python docs for groups in regex. You can form groups in matches by having sup regex you can than reference these matches. A minimal fully working example might look like that:

import sys
import re

if __name__ == '__main__':
    for line in sys.stdin:
        line = line.rstrip()
        if line in ['exit', 'q']:
            exit()
        else:
            match = re.search(r'(\d )(. )', line)
            print(f"group1: {match.group(1)} | group2: {match.group(2)}")

But while considering "good pythonic" solutions or getting a better performance might become handy in the long run your should not worry to much in the beginning.

  • Related