Sorry to bother because I know this topic already exists, but after a lots of tries I still couldn't arrive the the result I want.
My code:
string1 = 'James CameronSteven Spielberg'
string2 = 'Martin Scorsese'
string3 = 'John McQueen'
result1= re.split("(?=[a-zéè])(?=[A-ZÉÈÊ])", string1) # ['James Cameron','Steven Spielberg']
result2= re.split("(?=[a-zéè])(?=[A-ZÉÈÊ])", string2) # ['Martin Scorsese']
result3= re.split("(?=[a-zéè])(?=[A-ZÉÈÊ])", string3) # ['John Mc', 'Queen']
I'm trying to add an exception to my regex (it's a loop so I want to only use one regex), so I can except all names started with "Mc"
CodePudding user response:
You can use
(?<=[a-zéè])(?<!Mc)(?=[A-ZÉÈÊ])
See the regex demo. Details:
(?<=[a-zéè])
- a positive lookbehind that matches a location that is immediately preceded witha-z
andé
andè
letters(?<!Mc)
- a negative lookbehind that fails the match if there isMc
immediately to the left of the current position(?=[A-ZÉÈÊ])
- a positive lookahead that matches a location that is immediately followed with uppercase ASCII letters orÉ
,È
, orÊ
letter.