Home > Back-end >  How can I match a regex where a string does not contain a specific word without lookaheads/behinds?
How can I match a regex where a string does not contain a specific word without lookaheads/behinds?

Time:11-03

TLDR: What I am trying to accomplish is match a pattern where a string does not exist.

How can I match a string where a word does not appear in the string? I am writing the code in go, so I do not have access to any of the lookaround options in this case, and have to do this more in pure regex without relying on external libs. I believe I can accomplish this easily with a negative lookahead, but I am restricted from using that. Also, client side assertions (i.e. checking if first pattern matches and then checking the groups) isnt really an option here also.

Here is the sample data. In this data, I would like to capture app instances of new MyClass which does not have new AnotherClass as its argument.

# case 0
return new MyClass(constructor, options, ...);

# case 1
var mc = new MyClass();
# case 2
var mc = new MyClass(new AnotherClass());
# case 3
var amc = new MyClass(options, new AnotherClass(), ...);

# case 4
MyClass mc = new MyClass(
  something
);

# case 8
MyClass mc = new MyClass(
  something, new Render()
);

# case 5
MyClass mc = new MyClass(
  new AnotherClass()
);

# case 6
MyClass mc = new MyClass(
  options, new AnotherClass()
);

# case 7
var amc = new MyClass(new AnotherClass(), options, ...);

So in this case, I want to match case # 0, 1, 8 and 4. So far I can accomplish this using a group with the following regex (?s)(?:new\sMyClass\([^;]*?new\sAnotherClass\(\).*?\))|(new\sMyClass\([^;]*?\)). Do notice that I am using the (?s) flag as the data can have newlines.

The regex that I have now is: regex 101

The example is not quite what I am after as it is returning both desired and undesired results in the match. What I am trying to do is to match the required new MyClass instance where new AnotherClass is not an argument without using groups. With groups, I am getting both matches, and I am trying to narrow it down to only one match; i.e. only blue matches in regex 101.

Happy to answer any clarifying question if the question is not clear.

CodePudding user response:

Using a negative lookahead with a tempered dot is how I would generally do this. In the absence of lookarounds, you could phrase your logic as not matching any MyClass which has an instance of AnotherClass() in it, but also still matching MyClass:

matches:
new MyClass\(.*?\)

does NOT match:
new MyClass\([^)]*AnotherClass\(\).*\)

CodePudding user response:

Match when there are no brackets inside the brackets:

new\sMyClass\([^();]*\)

CodePudding user response:

The regexp library used in Go language has limited regex features, so the closest solution here is to match the new MyClass(...) with no new AnotherClass substring in between parentheses:

(?s)new\sMyClass\((?:[^;n]|n(?:n|e(?:n|w(?:n|\s(?:n|An(?:ew\sAn)*(?:n|o(?:n|t(?:n|h(?:n|e(?:n|r(?:n|C(?:n|l(?:n|a(?:n|sn))))))))|e(?:n|w(?:n|\sn)))))))*(?:[^;en]|e(?:[^;nw]|w(?:[^;\sn]|\s(?:[^;An]|A(?:[^;n]|n(?:ew\sAn)*(?:[^;eno]|o(?:[^;nt]|t(?:[^;hn]|h(?:[^;en]|e(?:[^;nr]|r(?:[^;Cn]|C(?:[^;ln]|l(?:[^;an]|a(?:[^;ns]|s[^;ns]))))))))|e(?:[^;nw]|w(?:[^;\sn]|\s(?:[^;An]|A[^;n]))))))))))*(?:n(?:n|e(?:n|w(?:n|\s(?:n|An(?:ew\sAn)*(?:n|o(?:n|t(?:n|h(?:n|e(?:n|r(?:n|C(?:n|l(?:n|a(?:n|sn))))))))|e(?:n|w(?:n|\sn)))))))*(?:e(?:(?:w(?:\sA?)?)?|w\sAn(?:ew\sAn)*(?:o(?:t?|th(?:e?|er(?:C?|Cl(?:a?|as))))|e(?:w(?:\sA?)?)?)?))?)?\)

See the regex demo.

The details how to obain this regex are located in the Regex: match everything but specific pattern post.

  • Related