Home > OS >  How to match multiple words using regex ignore order but exclusive
How to match multiple words using regex ignore order but exclusive

Time:11-20

I have a variant of exact match that I'm struggling to execute using regex. I would like to match several words (e.g. Apple, Bat, Car) to a string while ignoring order and also being exclusive (i.e. ignoring cases with extra words, or too few words). For example (using the list above), I'd like the following outcomes (true/false):

  • Bat, Car, Apple (True)
  • Car, Bat, Apple (True)
  • Apple, Car, Bat (True)
  • Apple, Car, Bat, Stick (False)
  • Bat, Car (False)
  • Apple (False)

I have tried two things;

(1) lookahead assertions

^(?=.*Apple)(?=.*Bat)(?=.*Car).*
  • Bat, Car, Apple (True)
  • Car, Bat, Apple (True)
  • Apple, Car, Bat (True)
  • Apple, Car, Bat, Stick (True)
  • Bat, Car (False)
  • Apple (False)

This almost works, but allows strings with additional words (e.g. the case with "Stick"). What can I add to exclude these cases, assuming "Stick" can be any other word, and there could be multiple additional words.

(2) Following related Q/A examples on stack overflow

^(Apple|Bat|Car|[,\s]) $
  • Bat, Car, Apple (True)
  • Car, Bat, Apple (True)
  • Apple, Car, Bat (True)
  • Apple, Car, Bat, Stick (False)
  • Bat, Car (True)
  • Apple (True)

Which again almost works, but it incorrectly includes the smaller subsets.

Note, my list of words to match is just an example, it will be variable.

CodePudding user response:

Firstly - this is quite a stretched usage of regex, you may be better off using other string functions (depending on language)

Regex: ^(apple|bat|car), (?!\1)(apple|bat|car), (?!\1|\2)(apple|bat|car)$

demo: https://regex101.com/r/Yc8CVj/2

very rough human translation: at the start of line, capture either word, see if next word is different and capture it if it is either of the other two, and then see if last word is the one left and the line ends after it

Features

  • prevents duplicates (apple, apple, car)
  • (according to demo) around 30 steps for match

CodePudding user response:

Try:

(?=.*Apple)(?=.*Car)(?=.*Bat)(?!.*(?:,|^)(?:(?!Apple|Bat|Car).) (?:,|$))^.*$

Regex demo.


(?=.*Apple)(?=.*Car)(?=.*Bat) - we want to match line where Apple, Car and Bat is found

(?!.*(?:,|^)(?:(?!Apple|Bat|Car).) (?:,|$)) - we don't want to match line where other word is found. Word is between commas ,, and/or start/end line

^.*$ - we want to match the whole line


EDIT: Regex with word boundaries \b (to not match Cartography for example):

(?=.*\bApple\b)(?=.*\bCar\b)(?=.*\bBat\b)(?!.*(?:,|^)(?:(?!\b(?:Apple|Bat|Car)\b).) (?:,|$))^.*$

Regex demo.

CodePudding user response:

An idea to just check for exactly three words after the lookaheads:

^(?=.*?\bApple\b)(?=.*?\bBat\b)(?=.*?\bCar\b)\w (?:, ?\w ){2}$

See this demo at regex101 - I further added \b word boundaries around the words.
\w matches word characters, used , ? comma and optional space between words.

  • Related