I want a regex to match all strings with the following condition:
- must start with '@' followed by at least 4 letters then optional alphanumeric characters and/or hyphen
Here's my regex: \B@([A-Za-z]{4,}[A-Za-z0-9-_]*)
.
Unfortunately, it fails to the string @panda2@you-· j@jjjhh @hhhh @hhhhh
as it also matches @panda2
. panda2
needs to be excluded since it isn't a whole word on its own.
What am I doing wrong ?
Also, how can I match but ignore the @ character at the beginning? I know I should use (?=) but don't know how.
CodePudding user response:
I think you're after:
import re
text = '@panda2@you-· j@jjjhh @hhhh @hhhhh'
print(re.findall(r'(?<=@)[A-Za-z]{4}[A-Za-z0-9-_]*(?!\S)', text))
Result:
['jjjhh', 'hhhh', 'hhhhh']
This differs from yours in that (?<=@)
checks that the matched string is preceded by a @
, but doesn't include it. And (?!\S)
at the end checks that the matched string is not followed by a character that's not a space (which is also true at the end of a string, unlike (?=\s)
, which would require a space to follow).
Also, since [A-Za-z]
is included in [A-Za-z0-9-_]
, there's no need for the comma in {4,}
, as remaining matches will be matched by the next part of the regex and you don't capture the first part as a group.