I am trying to write what I thought would be a simple regex pattern, but it turned out to be unexpectedly complicated.
I am trying to detect if:
- Two alternating words are not used in turns in a sentence: do detect "Cat cat." do not detect "Cat dog."
- There can be one or more other words between these words: do detect "The cat chased another cat." do not detect "The cat chased another dog."
- The words can be present more than one time in the sentence: do detect: "The cat chased the dog after the cat had chased another cat." do not detect: "The cat chased the dog after the cat had chased another dog."
- The sentence may include punctuation: do detect: "The cat chased the cat, and another cat chased, well – another dog." do detect: "The cat chased the dog, and another cat chased, well – another dog."
I'm so far with (in Autohotkey):
regex := "^(?:(?:(cat\b.*?(?<!cat)\bdog)|(dog\b.*?(?<!dog)\bcat)) |(?:cat|dog)\b.*?(?:cat|dog)\b)$"
string := "The cat chased the cat, and another cat chased, well – another dog." if (string ~= /regex/i) { MsgBox, in turns } else { MsgBox, not in turns }
But it does not work, and I'm stuck.
CodePudding user response:
Should be a piece of cake with the use of a regex backreference
. So you could do something like:
/(\b\w \b).*\b\1\b/
This regex will match, if a word repeats itself in a string. You can play it with online.
CodePudding user response:
To rephrase the problem: exclude/ignore a word between 2 words OR determine a specific word order in a sentence.
(cat(?:(?!dog).)*cat)|(dog(?:(?!cat).)*dog)
This regex works like this:
(cat(?:(?!dog).)*cat)
finds 2 cat words and no dog word between them(dog(?:(?!cat).)*dog)
finds 2 dog words and no cat word between them(?:(?!dog)
or(?:(?!cat)
simply excludes cat or dog as a non-capturing group
"Antipattern" (whole negation, finds only correct sentences):
^((?!((cat(?:(?!dog).)*cat)|(dog(?:(?!cat).)*dog))).)*$