Home > Blockchain >  How to avoid extracting last specific character in which is part of regex group?
How to avoid extracting last specific character in which is part of regex group?

Time:01-03

Given a command line

mycommand --optional-arguments their-values <patternOfInterestWithDirectoryPath> arg1 arg2

patternOfInterestWithDirectoryPath can be any of following

path/to/dir
/path/to/dir
path/to/dir/
"path/to/dir"
"/path/to/dir"
"path/to/dir/"

In any of above the ask is to extract /path/to/dir in all cases, where some of them may (or may not )be enclosed with double quotes, and/or may (or may not) have a leading /

Following regex does match but it also extracts the lastmost forward slash.

 \S*mycommand\s (?:-\S \s )*\"?([^\"] )\/?\"?.*

Attaching a negative lookahead like this does not work

 \S*mycommand\s (?:-\S \s )*"?([^\s"] (?!\/"))\/?"?.*

Any clue how to ignore the characters for extraction which are part of regex group but at specific position (eg the rightmost)?

CodePudding user response:

You can use

\S*mycommand\s (?:-\S \s )*(?|"([^"]*?)\/?"|(\S )(?<!\/)).*

See the regex demo. Details:

  • \S* - zero or more non-whitespace chars
  • mycommand - a literal string
  • \s - one or more whitespaces
  • (?:-\S \s )* - zero or more occurrences of -, one or more non-whitespaces, one or more whitespaces
  • (?|"([^"]*?)\/?"|(\S )(?<!\/)) - a branch reset group that matches either:
    • "([^"]*?)\/?" - ", Group 1 capturing any zero or more chars other than a ", as few as possible, and then an optional / and a " char
    • | - or
    • (\S )(?<!\/) - Group 1 (group ID is still 1 as it is inside a branch reset group): one or more whitespaces with no / at the end
  • .* - the rest of the line.
  • Related