Home > Mobile >  Match expression with regex until - followed by alphabet?
Match expression with regex until - followed by alphabet?

Time:05-06

I have strings with multiple dash characters, and I want the get the expression until the dash only if the dash is followwd by alphabet and not by numeric.

I have used

re.search("^([^-]) ","3x130-140k-ZZ-ABC") 

but it returns 3x130 which is until the first dash but I want 3x130-140k because only the second dash is followed by alphabet.

I want a regex which returns 3x130-140k from 3x130-140k-ZZ-ABC and returns 3x140k from 3x140k-ZZ-ABC .

CodePudding user response:

You can use

^.*?(?=-[A-Za-z])

See the regex demo. Details:

  • ^ - start of string
  • .*? - any zero or more chars other than line break chars as few as possible
  • (?=-[A-Za-z]) - a location that is immediately followed with - and an ASCII letter (use (?=-[^\W\d_]) to match any Unicode letter).

See the Python demo:

import re
text = "3x130-140k-ZZ-ABC"
m = re.search(r"^.*?(?=-[A-Za-z])", text)
if m:
    print(m.group()) # => 3x130-140k

CodePudding user response:

You can use -[a-zA-Z] to match a dash and then a letter. Then you can take the portion of the string before this match:

import re
raw_string = "3x130-140k-ZZ-ABC"
match = re.search("-[a-zA-z]", raw_string)
print(raw_string[:match.start()])

Output:

3x130-140k

CodePudding user response:

No need for a regexp. Use str.split(txt, '-') to split on dashes, then use next to find the index of the first dash followed by a word that satisfies word.isalpha().

def eat_until_dash_alphaword(txt, dash='-'):
    words = txt.split(dash)
    i = next((i for i in range(1, len(words) 1) if words[i].isalpha()), None)  # whole word is alpha
    #i = next((i for i in range(1, len(words) 1) if words[i][0].isalpha()), None)  # first char is alpha
    if i is not None:
        return dash.join(words[:i])
    else:
        return txt # or raise an error, or return empty string

print(eat_until_dash_alphaword("3x130-140k-ZZ-ABC"))
# 3x130-140k

If you want to find the first dash followed by at least one alpha character, then it's even simpler. You don't need str.split at all, just a simple iteration on the characters:

from itertools import pairwise

def eat_until_dash_alphachar(txt, dash='-'):
    for i, (a, b) in enumerate(pairwise(txt)):
        if a == dash and b.isalpha():
            return txt[:i]
    return txt # or raise an error, or return empty string

print(eat_until_dash_alphachar("3x130-140k-ZZ-ABC"))
# 3x130-140k

CodePudding user response:

You might also split on the first occurrence of a hyphen followed by a char [a-zA-Z] and limit the maxsplit to 1.

import re

strings = [
    "3x130-140k-ZZ-ABC",
    "3x140k-ZZ-ABC"
]
pattern = r"-[a-zA-Z]"
for s in strings:
    print(re.split(pattern, s, 1)[0])

Output

3x130-140k
3x140k
  • Related