Home > Enterprise >  Regex Python except a sequence of letters
Regex Python except a sequence of letters

Time:07-09

Sorry to bother because I know this topic already exists, but after a lots of tries I still couldn't arrive the the result I want.

My code:

string1 = 'James CameronSteven Spielberg'
string2 = 'Martin Scorsese'
string3 = 'John McQueen'

result1= re.split("(?=[a-zéè])(?=[A-ZÉÈÊ])", string1) # ['James Cameron','Steven Spielberg']
result2= re.split("(?=[a-zéè])(?=[A-ZÉÈÊ])", string2) # ['Martin Scorsese']
result3= re.split("(?=[a-zéè])(?=[A-ZÉÈÊ])", string3) # ['John Mc', 'Queen']

I'm trying to add an exception to my regex (it's a loop so I want to only use one regex), so I can except all names started with "Mc"

CodePudding user response:

You can use

(?<=[a-zéè])(?<!Mc)(?=[A-ZÉÈÊ])

See the regex demo. Details:

  • (?<=[a-zéè]) - a positive lookbehind that matches a location that is immediately preceded with a-z and é and è letters
  • (?<!Mc) - a negative lookbehind that fails the match if there is Mc immediately to the left of the current position
  • (?=[A-ZÉÈÊ]) - a positive lookahead that matches a location that is immediately followed with uppercase ASCII letters or É, È, or Ê letter.
  • Related