Home > Software engineering >  Building a RegEx, 12 letters without order, fixed number of individual letters
Building a RegEx, 12 letters without order, fixed number of individual letters

Time:02-22

I have a complex case where I can't get any further. The goal is to check a string via RegEx for the following conditions:

  • Exactly 12 letters
  • The letters W,S,I,O,B,A,R and H may only appear exactly once in the string
  • The letters T and E may only occur exactly 2 times in the string.
  • Important! The order must not matter

Example matches:

  • WSITTOBAEERH
  • HREEABOTTISW
  • WSITOTBAEREH

My first attempt:

results = re.match(r"^W{1}S{1}I{1}T{2}O{1}B{1}A{1}E{2}R{1}H{1}$", word)

The problem with this first attempt is that it only matches if the order of the letters in the RegEx has been followed. That violates condition 4

My second attempt:

results = re.match(r"^[W{1}S{1}I{1}T{2}O{1}B{1}A{1}E{2}R{1}H{1}]{12}$", word)

The problem with trial two: Now the order no longer matters, but the exact number of individual letters is ignored.

I can only do the basics of RegEx so far and can't get any further here. If anyone has an idea what a regular expression looks like that fits the four rules mentioned above, I would be very grateful.

CodePudding user response:

One possibility, although I still think regex is inappropriate for this. Checks that all letters appear the desired amount and that it's 12 letters total (so there's no room left for any more/other letters):

import re

for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
    print(re.fullmatch('(?=.*W)(?=.*S)(?=.*I)(?=.*O)'
                       '(?=.*B)(?=.*A)(?=.*R)(?=.*H)'
                       '(?=.*T.*T)(?=.*E.*E).{12}', s))

Another, checking that none other than T and E appear twice, that none appear thrice, and that we have only the desired letters, 12 total:

import re

for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
    print(re.fullmatch(r'(?!.*([^TE]).*\1)'
                       r'(?!.*(.).*\1.*\1)'
                       r'[WSIOBARHTE]{12}', s))

A simpler way:

for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
    print(sorted(s) == sorted('WSIOBARHTTEE'))
  • Related