I have the regex (?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z] [A-Za-z0-9-_] )(?!\w)
.
Given the string @first@nope @second@Hello @my-friend, email@ [email protected] @friend
, what can I do to exclude the strings @first
and @second
since they are not whole words on their own ?
In other words, exclude them since they are succeeded by @ .
CodePudding user response:
You can use
(?<![a-zA-Z0-9_.-])@(?=([A-Za-z] [A-Za-z0-9_-]*))\1(?![@\w])
(?a)(?<![\w.-])@(?=([A-Za-z][\w-]*))\1(?![@\w])
See the regex demo. Details:
(?<![a-zA-Z0-9_.-])
- a negative lookbehind that matches a location that is not immediately preceded with ASCII digits, letters,_
,.
and-
@
- a@
char(?=([A-Za-z] [A-Za-z0-9_-]*))
- a positive lookahead with a capturing group inside that captures one or more ASCII letters and then zero or more ASCII letters, digits,-
or_
chars\1
- the Group 1 value (backreferences are atomic, no backtracking is allowed through them)(?![@\w])
- a negative lookahead that fails the match if there is a word char (letter, digit or_
) or a@
char immediately to the right of the current location.
Note I put hyphens at the end of the character classes, this is best practice.
The (?a)(?<![\w.-])@(?=([A-Za-z][\w-]*))\1(?![@\w])
alternative uses shorthand character classes and the (?a)
inline modifier (equivalent of re.ASCII
/ re.A
makes \w
only match ASCII chars (as in the original version). Remove (?a)
if you plan to match any Unicode digits/letters.
CodePudding user response:
Another option is to assert a whitespace boundary to the left, and assert no word char or @ sign to the right.
(?<!\S)@([A-Za-z] [\w-] )(?![@\w])
The pattern matches:
(?<!\S)
Negative lookbehind, assert not a non whitespace char to the left@
Match literally([A-Za-z] [\w-] )
Capture group1, match 1 chars A-Za-z and then 1 word chars or-
(?![@\w])
Negative lookahead, assert not @ or word char to the right
Or match a non word boundary \B
before the @ instead of a lookbehind.
\B@([A-Za-z] [\w-] )(?![@\w])