I'm using python, re.match to match. I want to match all the strings that have 4 characters not counting the ː symbol (it's an international phonetic alphabet symbol).
So the string "niːdi" should be matched. Regex should count it as 4 characters, not 5, because the ː symbol isn't counted.
So far, I have this. What should I add to make it not count the ː symbol ?
regex = "^.{1,5}$"
I don't want to delete the ː symbol from any of my strings. It's important that it stays in the data.
CodePudding user response:
You can use
regex = "^(?=.{1,5}$)[^ː]*(?:ː[^ː]*)?$"
Details:
^
- start of string(?=.{1,5}$)
- the length is from 1 to 5[^ː]*
- zero or more chars other thanː
(?:ː[^ː]*)?
- an optional sequence ofː
and zero or more chars other thanː
$
- end of string.
CodePudding user response:
For the regex to match anything that has one to two characters, ':', then one to two characters, you could use something like this: ^.{1,2}:.{1,2}$
If you need it to be two:two, you can simplify the regex like this: ^.{2}:.{2}$
I'm not sure about the character count issue, since the :
is a char even though for your data it doesn't add value. Maybe you can subtract 1 from the count for each match you get
Good luck!
CodePudding user response:
Try something like
regex = "^(:*[^:]:*){4}$"