I have strings with multiple dash characters, and I want the get the expression until the dash only if the dash is followwd by alphabet and not by numeric.
I have used
re.search("^([^-]) ","3x130-140k-ZZ-ABC")
but it returns 3x130
which is until the first dash but I want 3x130-140k
because only the second dash is followed by alphabet.
I want a regex which returns 3x130-140k
from 3x130-140k-ZZ-ABC
and returns 3x140k
from 3x140k-ZZ-ABC
.
CodePudding user response:
You can use
^.*?(?=-[A-Za-z])
See the regex demo. Details:
^
- start of string.*?
- any zero or more chars other than line break chars as few as possible(?=-[A-Za-z])
- a location that is immediately followed with-
and an ASCII letter (use(?=-[^\W\d_])
to match any Unicode letter).
See the Python demo:
import re
text = "3x130-140k-ZZ-ABC"
m = re.search(r"^.*?(?=-[A-Za-z])", text)
if m:
print(m.group()) # => 3x130-140k
CodePudding user response:
You can use -[a-zA-Z] to match a dash and then a letter. Then you can take the portion of the string before this match:
import re
raw_string = "3x130-140k-ZZ-ABC"
match = re.search("-[a-zA-z]", raw_string)
print(raw_string[:match.start()])
Output:
3x130-140k
CodePudding user response:
No need for a regexp. Use str.split(txt, '-')
to split on dashes, then use next
to find the index of the first dash followed by a word that satisfies word.isalpha()
.
def eat_until_dash_alphaword(txt, dash='-'):
words = txt.split(dash)
i = next((i for i in range(1, len(words) 1) if words[i].isalpha()), None) # whole word is alpha
#i = next((i for i in range(1, len(words) 1) if words[i][0].isalpha()), None) # first char is alpha
if i is not None:
return dash.join(words[:i])
else:
return txt # or raise an error, or return empty string
print(eat_until_dash_alphaword("3x130-140k-ZZ-ABC"))
# 3x130-140k
If you want to find the first dash followed by at least one alpha character, then it's even simpler. You don't need str.split
at all, just a simple iteration on the characters:
from itertools import pairwise
def eat_until_dash_alphachar(txt, dash='-'):
for i, (a, b) in enumerate(pairwise(txt)):
if a == dash and b.isalpha():
return txt[:i]
return txt # or raise an error, or return empty string
print(eat_until_dash_alphachar("3x130-140k-ZZ-ABC"))
# 3x130-140k
CodePudding user response:
You might also split on the first occurrence of a hyphen followed by a char [a-zA-Z]
and limit the maxsplit to 1.
import re
strings = [
"3x130-140k-ZZ-ABC",
"3x140k-ZZ-ABC"
]
pattern = r"-[a-zA-Z]"
for s in strings:
print(re.split(pattern, s, 1)[0])
Output
3x130-140k
3x140k