Regex to match all characters in a string except a certain character-CodePudding

I'm using python, re.match to match. I want to match all the strings that have 4 characters not counting the ː symbol (it's an international phonetic alphabet symbol).

So the string "niːdi" should be matched. Regex should count it as 4 characters, not 5, because the ː symbol isn't counted.

So far, I have this. What should I add to make it not count the ː symbol ?

regex = "^.{1,5}$"

I don't want to delete the ː symbol from any of my strings. It's important that it stays in the data.

CodePudding user response：

You can use

regex = "^(?=.{1,5}$)[^ː]*(?:ː[^ː]*)?$"

Details:

^ - start of string
(?=.{1,5}$) - the length is from 1 to 5
[^ː]* - zero or more chars other than ː
(?:ː[^ː]*)? - an optional sequence of ː and zero or more chars other than ː
$ - end of string.

CodePudding user response：

For the regex to match anything that has one to two characters, ':', then one to two characters, you could use something like this: ^.{1,2}:.{1,2}$ If you need it to be two:two, you can simplify the regex like this: ^.{2}:.{2}$

I'm not sure about the character count issue, since the : is a char even though for your data it doesn't add value. Maybe you can subtract 1 from the count for each match you get

Good luck!

CodePudding user response：

Try something like

regex = "^(:*[^:]:*){4}$"