Home > front end >  RegExp - How to match all except the last one if it is followed by something?
RegExp - How to match all except the last one if it is followed by something?

Time:10-03

I have those string examples (starts with a specific string: (ex: S and then, it is snake case separated by . and sometimes followed by ()):

T.name.other_name.another_name

T.name.again_name.ect_name.last_name()

I'm trying to use a RegExp to match all the snake case parts (without the .) but not the last one if it is followed by ().

So the matches should be:

name other_name another_name

name again_name etc_name (and not last_name).

But I cannot manage to find one. How can I do that?


If the matches include the . that is fine too:

name.other_name.another_name

name.again_name.ect_name (but not .last_name())


I tried this regexp:

T((\.([a-z]|\_)*)*)(\.([a_z]|\_)\(\))?

and wanted to extract the 2nd group match, but it always includes last_name.

CodePudding user response:

Here is what you need to use in the Highlight Visual Studio Code extension settings:

"(?<=\\bT(?:\\.[a-z_] )*\\.)([a-z_] )\\b(?!\\(\\))": { 
        "regexFlags": "g",
        "decorations":[ 
            { "color": "yellow" }
        ]
    }
}

Demo highlighting:

enter image description here

See the regex demo.

Notes:

  • "regexFlags": "g" is important, as by default the highlighting is case insensitive. If you need to make it case insensitive, add i.
  • Make sure the regex escape sequences are formed with double backslashes.
  • There is at least one capturing group: the decorations are applied to capturing groups, and you may define as many as there are groups.
  • The regex flavor is JavaScript, so you can use infinite-length lookbehind patterns.

Regex details:

  • (?<=\bT(?:\.[a-z_] )*\.) - a positive lookbehind that matches a location that is immediately preceded with a whole word T (\b is a word boundary) followed with zero or more occurrences of . and one or more lowercase ASCII letters or _s and then a . char
  • ([a-z_] ) - Capturing group 1: one or more lowercase ASCII letters or _ chars
  • \b - a word boundary (used to make sure the lookahead that follows could only be executed once, to prevent backtracking into the captured word pattern
  • (?!\(\)) - a negative lookahead that fails the match of there is () text immediately to the right of the current location.

CodePudding user response:

If simplicity is what you need, then using a bare minimum of a single assertion
at the end might be all you need.

Overall though, the easiest is to make a single full match, then split the result
in capture group 1 on the periods.

T\.((?:[a-z_]*\.)*[a-z_]*)(?![a-z_]*\(\))  

https://regex101.com/r/W89xxe/1

 T
 \. 
 (                             # (1 start)
    (?: [a-z_]* \. )*
    [a-z_]* 
 )                             # (1 end)
 (?! [a-z_]* \( \) )
  • Related