Home > Blockchain >  How to limit the search scope without regex lookbehinds?
How to limit the search scope without regex lookbehinds?

Time:12-06

Given a regular expression, I can easily decide where to start looking for a match from in a string using lastIndex.
Now, I want to make sure that the match I get doesn't go past a certain point in the string.

I would happily enclose the regular expression in a non-capturing group and append, for instance, (?<=^.{0,8}).

But how can I achieve the same goal without lookbehinds, that still aren't globally supported?

Note:

  • While it might be the only reasonable fallback, slicing the string is not a good option as it results in a loss of context for the search.

Example

https://regex101.com/r/7bWtSW/1

with the base regular expression that:

  • matches the letter 'a', at least once and as many times as possible
  • as long as an 'X' comes later

We can see that we can achieve our goal with a lookbehind: we still get a match, shorter.
However, if we sliced the string, we would lose the match (because the lookahead in the base regular expression would fail).

CodePudding user response:

You can use a lookahead with you regex which puts you at the beginning of the match, then match any x characters with .{x}.

For your example it'd be:

String: aaaaaaaX

(?=a X).{4}

const input = 'aaaaaaaX'
const regex = /(?=a X).{4}/
console.log(input.match(regex))

CodePudding user response:

Your pattern in the regex demo (?:a (?=.*X))(?<=^.{0,4}) uses a lookbehind assertion with that can yield multiple separate matches.

See a regex demo for the same pattern with multiple matches in the same string

Without using a lookbehind, you can not get those separate matches.

What you might do is use an extra step to get all the matches for consecutive a char over matched part that fulfills the length restriction (In this case the group 1 value)

^([^\nX]{0,3}a)[^\nX]*X

The pattern matches

  • ^ Start of string
  • ( Capture group 1
    • [^\nX]{0,3}a Match 0-3 times a char other than a newline or X and then match a
  • ) Close group 1
  • [^\nX]*X Match optional chars other than a newline or X and then match X

Regex demo

const regex = /^([^\nX]{0,3}a)[^\nX]*X/;
[
  "aaaaaaaaX",
  "baaaaaaaaX",
  "bbaaaaaaaaX",
  "bbbaaaaaaaaX",
  "bbbbaaaaaaaaX",
  "babaaaaaaaaX",
  "aX",
  "abaaX"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[1].match(/a /g))
  }
})

  • Related