I have a complex case where I can't get any further. The goal is to check a string via RegEx for the following conditions:
- Exactly 12 letters
- The letters W,S,I,O,B,A,R and H may only appear exactly once in the string
- The letters T and E may only occur exactly 2 times in the string.
- Important! The order must not matter
Example matches:
- WSITTOBAEERH
- HREEABOTTISW
- WSITOTBAEREH
My first attempt:
results = re.match(r"^W{1}S{1}I{1}T{2}O{1}B{1}A{1}E{2}R{1}H{1}$", word)
The problem with this first attempt is that it only matches if the order of the letters in the RegEx has been followed. That violates condition 4
My second attempt:
results = re.match(r"^[W{1}S{1}I{1}T{2}O{1}B{1}A{1}E{2}R{1}H{1}]{12}$", word)
The problem with trial two: Now the order no longer matters, but the exact number of individual letters is ignored.
I can only do the basics of RegEx so far and can't get any further here. If anyone has an idea what a regular expression looks like that fits the four rules mentioned above, I would be very grateful.
CodePudding user response:
One possibility, although I still think regex is inappropriate for this. Checks that all letters appear the desired amount and that it's 12 letters total (so there's no room left for any more/other letters):
import re
for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
print(re.fullmatch('(?=.*W)(?=.*S)(?=.*I)(?=.*O)'
'(?=.*B)(?=.*A)(?=.*R)(?=.*H)'
'(?=.*T.*T)(?=.*E.*E).{12}', s))
Another, checking that none other than T and E appear twice, that none appear thrice, and that we have only the desired letters, 12 total:
import re
for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
print(re.fullmatch(r'(?!.*([^TE]).*\1)'
r'(?!.*(.).*\1.*\1)'
r'[WSIOBARHTE]{12}', s))
A simpler way:
for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
print(sorted(s) == sorted('WSIOBARHTTEE'))