Home > Back-end >  Python regex ignore word if character is still present
Python regex ignore word if character is still present

Time:09-23

I want a regex to match all strings with the following condition:

  • must start with '@' followed by at least 4 letters then optional alphanumeric characters and/or hyphen

Here's my regex: \B@([A-Za-z]{4,}[A-Za-z0-9-_]*) .

Unfortunately, it fails to the string @panda2@you-· j@jjjhh @hhhh @hhhhh as it also matches @panda2. panda2 needs to be excluded since it isn't a whole word on its own.

What am I doing wrong ?

Also, how can I match but ignore the @ character at the beginning? I know I should use (?=) but don't know how.

CodePudding user response:

I think you're after:

import re

text = '@panda2@you-· j@jjjhh @hhhh @hhhhh'
print(re.findall(r'(?<=@)[A-Za-z]{4}[A-Za-z0-9-_]*(?!\S)', text))

Result:

['jjjhh', 'hhhh', 'hhhhh']

This differs from yours in that (?<=@) checks that the matched string is preceded by a @, but doesn't include it. And (?!\S) at the end checks that the matched string is not followed by a character that's not a space (which is also true at the end of a string, unlike (?=\s), which would require a space to follow).

Also, since [A-Za-z] is included in [A-Za-z0-9-_], there's no need for the comma in {4,}, as remaining matches will be matched by the next part of the regex and you don't capture the first part as a group.

  • Related