Home > other >  Regex : how to optional capture a group
Regex : how to optional capture a group

Time:10-16

I'm trying to make an substring optional. Here is the source :

Movie TOTO S09 E22 2022 Copyright

I want to optionally capture the substring : S09 E22

What I have tried so far :

/(Movie)(.*)(S\d\d\s*E\d\d)?/gmi

The problem is that it ends up by matching S09 E22 2022 Copyright instead of just S09 E22 :

Match 1 : 0-33  Movie TOTO S09 E22 2022 Copyright 
Group 1 : 0-5   Movie
Group 2:  5-33   TOTO S09 E22 2022 Copyright

Is there anyway to fix this issue ?

Regards

CodePudding user response:

You get that match because the .* is greedy and will first match until the end of the string.

Then your (S\d\d\s*E\d\d)? is optional so this will stay matched and does not backtrack.

If you don't want partial matches for S09 or E22 and the 4 digits for the year are not mandatory and you have movies longer than 1 word, with pcre you could use:

\b(Movie)\b\h ((?:(?!\h [SE]\d \b).)*)(?:\h(S\d \h E\d ))?
  • \b(Movie)\b Capture the word Movie
  • ( Capture group
    • (?: Non capture group to repeat as a whole part
      • (?!\h [SE]\d \b). Match any character if either the S01 or E22 part is not directly to the right (where [SE] matches either a S or E char, and \h matches a horizontal whitespace char)
    • )* Close the non capture group and optionall repeat it
  • ) Close capture group
  • (?:\h(S\d \h E\d )) Optionally capture the S01 E22 part (where \d matches 1 or more digits)

Regex demo

Another option with a capture group for the S01 E22 part, or else match the redt of the line

\b(Movie)\h ([^S\n]*(?:S(?!\d \h E\d \b)[^S\n]*)* )(S\d \h E\d )?

Regex demo

CodePudding user response:

With your shown samples and attempts please try following regex.

^Movie\s \S \s (S\d{2}\s E\d{2}(?=\s \d{4}))

Here is the Online Demo for used regex.

Explanation: Adding detailed explanation for used regex above.

^Movie\s \S \s   ##Matching string Movie from starting of value followed by spaces non-spaces and spaces.
(S\d{2}\s E\d{2} ##Creating one and only capturing group where matching:
                 ##S followed by 2 digits followed by spaces followed by E and 2 digits.
  (?=\s \d{4})   ##Making sure by positive lookahead that previous regex is followed by spaces and 4 digits.
)                ##Closing capturing group here.

CodePudding user response:

An idea to make the dot lazy .*? and force it to match up to $ end if other part doesn't exist.

Movie\s*(.*?)\s*(S\d\d\s*E\d\d|$)

See this demo at regex101 (further I added some \s* spaces around captures)

CodePudding user response:

There are several errors in your regex:

  • Blank space after Movie is not considered.
  • (.*) matches everything after Movie.

Try online at https://regex101.com/

(Movie\s*)(\w*\s*)(S\d{2}\s*E\d{2}\s*)?((?:\w*\s*)*)
  • Related