Home > Blockchain >  Regex: Match a character that is unmatched in a negated set only a certain number of times at the en
Regex: Match a character that is unmatched in a negated set only a certain number of times at the en

Time:12-01

In Javascript, I want to match (i.e. add to the result) a parenthesis ")" if it appears twice repeatedly at the end of the string, and unmatch it if it either appears once or more than two times. Probably the answer is to remove the parenthesis from the negated set and "use it somewhere else in a different way" — thus, I have attempted adapting this approach, without any success. Though, my regex is fairly big and tricky, so certain regex expressions "become useless" — from what I can tell from my experience, as I am not that experienced with regexes. So, here's my regex:

/(?<![a-zA-Z/] )(https?:)?\/{1,2}[a-zA-Z]\S*(?<=\){2}|(?<=[^:"'\]);,]))/g

See, my efforts on accomplishing this are "at one step of success" as my current regex matches the parenthesis when it appears twice, but it doesn't when it shows once, but continues if it appears more than two times. To put an example:


Example URL: https://docs.google.com/picker?protocol=gadgets [...] &nav=(("fonts"))

Results:

[...] &nav=(("fonts") - doesn't match ") - good

[...] &nav=(("fonts")) - matches )) - good

[...] &nav=(("fonts"))) [...] - matches ))) but does not with unwanted characters on the negated set - (kind of) bad

..and so on...


I have attempted through different lookarounds and quantifiers... "mixes" and ways, and have accomplished no better success than the regex I have written previously.

By the way, I don't want to use the beginning (^) and end ($) characters on the regex — as I am using it on big and variate scripts, and thus I am using a global context; probably I am mistaken on this statement, so correct me if necessary — but if they are required — as I have tried on more simple regexes — I will not concern too much.

As Wiktor Stribiżew requested, here is the expected behavior of the regex with the aformentioned example:


Expected results:

https://docs.google.com/picker?protocol=gadgets [...] &nav=(("fonts") - should match https://docs.google.com/picker?protocol=gadgets [...] &nav=(("fonts

https://docs.google.com/picker?protocol=gadgets [...] &nav=(("fonts")) - should match all the URL (the original URL)

https://docs.google.com/picker?protocol=gadgets [...] &nav=(("fonts"))) - should match https://docs.google.com/picker?protocol=gadgets [...] &nav=(("fonts

CodePudding user response:

It seems you can use

(?<![a-zA-Z/])(?:https?:)?\/{1,2}[a-zA-Z]\S*(?:[^\s:"'\\)]|(?<!\))\)\)(?!\S))

Or, to account for any non-word chars,

(?<![a-zA-Z/])(?:https?:)?\/{1,2}[a-zA-Z]\S*(?:\b|(?<!\))\)\)(?!\S))

See the regex demo. Details:

  • (?<![a-zA-Z/]) - a negative lookbehind that fails the match if there is a letter or / immediately to the left of the current location
  • (?:https?:)? - an optional http: or https: string
  • \/{1,2} - one or two /s
  • [a-zA-Z] - a letter
  • \S* - zero or more non-whitespaces
  • (?:\b|(?<!\))\)\)(?!\S)) - either a word boundary or a )) string not preceded by another ) and not directly followed with a non-whitespace char.
  • Related