Is there a more efficient RegEx for cheating at Wordle?-CodePudding

So I have a list of all 5 letter words in the English language that I can interrogate when I'm really stuck at Wordle. I found this an excellent exercise for brushing up on my Regular Expressions in BBEDIT, which is what I tell myself I'm doing.

The way wordle works, I can have three conditions.

A letter that is somewhere in the word (and must be present)
A letter that is not present in the word
A letter that is correct in presence and position

Condition 3 is easy. If my start word "crone" has the n in the right place, my pattern is

...n.

And I can add condition 2 fairly easily with

^(?!.*[croe])...n.

If my next guess is "burns" I'll know there's an "s"

^(?!.*[croebur])^(?=.*s)...n.

And that it's not in the last position:

^(?!.*[croebur])^(?=.*s)...n[^s]

If my next (very poor) guess is 'stone' I'll know there's a 't'.

^(?!.*[croebur])^(?=.*s)^(?=.*t)sa.n.

So that's a workable formula.

But if my next guess were "wimpy" I'd know there was an 'i' in the answer, but I have to add an additional ^(?=.*i) which just feels inefficient. I tried grouping the letters that must be in the word by using a bracket set, ^(?=.*[ist]) but of course that will match targets that contain any one of those characters rather than all.

Is there a more efficient way to express the phrase "the word must contain all of the following letters to match" than a series of "start at the beginning, scan for occurence of this single character until the end" phrases?

CodePudding user response：

If you enter a word into Wordle, it displays all the matched characters in your word. It also shows the characters which exist in the word but not in the correct order.

Considering these requirements, I think you should create different rules for each letter's place. This way, your regex pattern keeps simple, and you get the search results quickly. Let me give an example:

Input word: crone
Matched Characters: ...n.
Characters in the wrong place: -
Next regex search pattern: ^[^crone][^crone][^crone]n[^crone]$
Input word: burns
Matched Characters: ...n.
Characters in the wrong place: s
Next regex search pattern: ^(?=\S*[s]\S*)[^bucrone][^bucrone][^bucrone]n[^bucrones]$ (Be careful, there is an "s" character in the last parenthesis because we know its place isn't there.)
Input word: stone
Matched Characters: s..n.
Characters in the wrong place: t
Next regex search pattern: ^(?=\S*[t]\S*)s[^tsbucrone][^sbucrone]n[^sbucrones]$ (Be careful, there is a "t" character in the first parenthesis because we know its place isn't there.)

^ => Start of the line
[^abc] => Any character except "a" and "b" and "c"
(?=\S*[t]\S*)=> There must be a "t" character in the given string
(?=\S*[t]\S*)(?=\S*[u]\S*)=> There must be "t" and "u" characters in the given string
$ => End of the line

When we look at performance tests of the regex patterns with a seven-word sample, my regex pattern found the result in 130 steps, whereas your pattern in 175 steps. The performance difference will increase as the word-list increase. You can review it from the following links:

Suggested pattern: https://regex101.com/r/mvHL3J/1
Your pattern: https://regex101.com/r/Nn8EwL/1

Note: You need to click the "Regex Debugger" link in the left sidebar to see the steps.

Note 2: I updated my response to fix the bug in the following comment.