Home > Software engineering >  Regex for specific permutations of a word
Regex for specific permutations of a word

Time:04-18

I am working on a wordle bot and I am trying to match words using regex. I am stuck at a problem where I need to look for specific permutations of a given word.

For example, if the word is "steal" these are all the permutations: 'tesla', 'stale', 'steal', 'taels', 'leats', 'setal', 'tales', 'slate', 'teals', 'stela', 'least', 'salet'.

I had some trouble creating a regex for this, but eventually stumbled on positive lookaheads which solved the issue. regex -

'(?=.*[s])(?=.*[l])(?=.*[a])(?=.*[t])(?=.*[e])'

But, if we are looking for specific permutations, how do we go about it?

For example words that look like 's[lt]a[lt]e'. The matching words are 'steal', 'stale', 'state'. But I want to limit the count of l and t in the matched word, which means the output should be 'steal' & 'stale'. 1 obvious solution is this regex r'slate|stale', but this is not a general solution. I am trying to arrive at a general solution for any scenario and the use of positive lookahead above seemed like a starting point. But I am unable to arrive at a solution.

Do we combine positive lookaheads with normal regex?

s(?=.*[lt])a(?=.*[lt])e (Did not work)

Or do we write nested lookaheads or something?

A few more regex that did not work -

s(?=.*[lt]a[tl]e)
s(?=.*[lt])(?=.*[a])(?=.*[lt])(?=.*[e])

I tried to look through the available posts on SO, but could not find anything that would help me understand this. Any help is appreciated.

CodePudding user response:

You could append the regex which matches the permutations of interest to your existing regex. In your sample case, you would use:

(?=.*s)(?=.*l)(?=.*a)(?=.*t)(?=.*e)s[lt]a[lt]e

This will match only stale and slate; it won't match state because it fails the lookahead that requires an l in the word.

Note that you don't need the (?=.*s)(?=.*a)(?=.*e) in the above regex as they are required by the part that matches the permutations of interest. I've left them in to keep that part of the regex generic and not dependent on what follows it.

Demo on regex101

Note that to allow for duplicated characters you might want to change your lookaheads to something in this form:

(?=(?:[^s]*s){1}[^s]*)

You would change the quantifier on the group to match the number of occurrences of that character which are required.

  • Related