Home > Net >  Regex is capturing too much of the string
Regex is capturing too much of the string

Time:10-27

I have these three regex statements and I'm using Javascript

/(?<=[AND])\s?\[(.*?)\]/
/(?<=[OR])\s?\[(.*?)\]/
/(?<=[NOT])\s?\[(.*?)\]/

given a string like -> AND [thing1, thing2, thing3] OR [thing4, thing5] NOT [thing6] I would expect the matches for the patterns to return in order

thing1, thing2, thing3
thing4, thing5
thing6

When a user enters a string like this -> AND [thing1, thing2, thing3 OR [thing4, thing5] the first pattern returns

thing1, thing2, thing3 OR [thing4, thing5

I'm trying to figure out how to prevent the regex from matching when there is a boolean keyword present before a closing bracket. I've tried messing around with adding [^NOT]|[^OR] ^NOT|^OR in the capturing group but nothing I've done works right (regex newb here).

Also if there are any other potentially obvious mistakes with my current regex you see please point them out.

CodePudding user response:

Why not combine each of these patterns into a single matching group? You seem to have some fundamental misconceptions, specifically regarding the use of square brackets ([]) in your pattern.

It's not clear why you've elected to include your logical tokens in these square brackets - in RegExp, these are used specifically to denotate character sets. The way you've expressed this in your original pattern matches one of the two or three characters in the set literally, and not the entire word (as appears to be your intention). You also seem to have fallen victim to the same misconception in your attempt to include these logical tokens in the last group in your pattern.

Instead, use alternatives (denotated by vertical pipes |) correctly:

const test1 = `AND [thing1, thing2, thing3] OR [thing4, thing5] NOT [thing6]`;
const test2 = `AND [thing1, thing2, thing3 OR [thing4, thing5]`;

const pattern = /(?<=AND|OR|NOT)(?:\s?)\[(.*?)(?:\]|\s(AND|OR|NOT))/g;

console.log([...test1.matchAll(pattern)].map(match => match[1]));
console.log([...test2.matchAll(pattern)].map(match => match[1]));
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

regex101

Since you're new to JavaScript, I'd recommend using a utility like regex101 to build your patterns - by default, a pane in the right-hand side of the window explains, in plain English, what each part of your pattern actually does, which you can compare to what you expect it to do and adjust accordingly.

CodePudding user response:

  • Using [AND] means a character class, and matches a single character A N D
  • Using [^NOT] means a negated character class matching any single char except N O T
  • Using ^NOT|^OR means matching either NOT or OR at the start of the string

If you want to 3 alternatives, you can do so using an alternation in a group (?:AND|OR|NOT)

Using Javascript, you might assert that between the opening and the closing square bracket there is no AND OR NOT

(?<=(?:AND|OR|NOT)\s?\[)(?![^\][]*\b(?:AND|OR|NOT)\b)[^\][]*(?=\])
  • (?<=(?:AND|OR|NOT)\s?\[) Positive lookbehind, assert any of the alternatives to the left
  • (?![^\][]*\b(?:AND|OR|NOT)\b) Negative lookahead to assert not any of the alternatives before any of the square brackets
  • [^\][]* Match optional chars other than [ and ]
  • (?=\]) Positive lookahead, assert ] to the right

Regex demo

  • Related