I'm extracting users getting tagged in messages, where the username contains digits so I need to extract words from a long string with the following conditions:
- it has to be 10 to 14 characters long
- it can start with a
@
but it is not necessary (this is the only special character allowed and if the word contains it, it has to be the first character) - it can contain numbers and characters
- it can be only numbers, but it can't be only characters
Example:
str = "I have a pretty nice gaming experience with the user: @THYSSEN1145 and his brother THYSSEN1146.
His username was first THY@SSEN1145, his brother's was 1234567891011. I played with them 123456789 times up until this point. "
Words that the regular expression should extract:
@THYSSEN1145
THYSSEN1146
1234567891011
CodePudding user response:
You might use
(?<!\S)@?(?=[A-Za-z\d]{10,14}\b)[A-Za-z]*\d[A-Za-z\d]*
(?<!\S)
Assert a whitespace boundary to the left@?
Match an optional@
(?=[A-Za-z\d]{10,14}\b)
Assert 10 - 14 word characters followed by a word boundary[A-Za-z]*\d[A-Za-z\d]*
Match at least a digit in the rangesA-Za-z\d
import re
pattern = r"(?<!\S)@?(?=[A-Za-z\d]{10,14}\b)[A-Za-z]*\d[A-Za-z\d]*"
s = ("I have a pretty nice gaming experience with the user: @THYSSEN1145 and his brother THYSSEN1146. \n"
"His username was first THY@SSEN1145, his brother's was 1234567891011. I played with them 123456789 times up until this point.")
print(re.findall(pattern, s))
Output
['@THYSSEN1145', 'THYSSEN1146', '1234567891011']