Home > Back-end >  Regex: Match up to, but not including word, and still match if word does not exist
Regex: Match up to, but not including word, and still match if word does not exist

Time:10-07

I have files in the following format:

123_example-1234567_any-characters_here!-ignore.ext

And I want to capture the four groups:

  1. 123_example
  2. 1234567
  3. any-characters_here!
  4. .ext

Which I can do just fine with something like

^(\d{3}_[^\-] )-(\d )_(. )-ignore(\.ext)$

However, sometimes these files do not have the -ignore string (assume this string can only ever be -ignore). For example:

123_example-1234567_any-characters_here!.ext

How can I modify my regex so that it matches both strings and returns the same groups?

My attempt on https://regex101.com/r/pOnEIe/1 where I thought a capture group inside a non capture group might have been the answer.

CodePudding user response:

The capture group at the end should contain a non greedy quantifier, and following that should be the optional group for -ignore

Note that this part [^-] might als match newlines.

^(\d{3}_[^-] )-(\d )_(. ?)(?:-ignore)?(\.ext)$

Explanation

  • ^ Start of string
  • (\d{3}_[^-] ) Capture 3 digits, match _ and 1 chars other than -
  • -(\d )_ Match -, capture 1 digits and match _
  • (. ?) Capture 1 chars, as few as possible
  • (?:-ignore)? Optionally match -ignore
  • (\.ext) Capure .ext
  • $ End of string

Regex demo

  • Related