I'm trying to make an substring optional. Here is the source :
Movie TOTO S09 E22 2022 Copyright
I want to optionally capture the substring : S09 E22
What I have tried so far :
/(Movie)(.*)(S\d\d\s*E\d\d)?/gmi
The problem is that it ends up by matching S09 E22 2022 Copyright
instead of just S09 E22
:
Match 1 : 0-33 Movie TOTO S09 E22 2022 Copyright
Group 1 : 0-5 Movie
Group 2: 5-33 TOTO S09 E22 2022 Copyright
Is there anyway to fix this issue ?
Regards
CodePudding user response:
You get that match because the .*
is greedy and will first match until the end of the string.
Then your (S\d\d\s*E\d\d)?
is optional so this will stay matched and does not backtrack.
If you don't want partial matches for S09 or E22 and the 4 digits for the year are not mandatory and you have movies longer than 1 word, with pcre you could use:
\b(Movie)\b\h ((?:(?!\h [SE]\d \b).)*)(?:\h(S\d \h E\d ))?
\b(Movie)\b
Capture the word Movie(
Capture group(?:
Non capture group to repeat as a whole part(?!\h [SE]\d \b).
Match any character if either the S01 or E22 part is not directly to the right (where[SE]
matches either aS
orE
char, and\h
matches a horizontal whitespace char)
)*
Close the non capture group and optionall repeat it
)
Close capture group(?:\h(S\d \h E\d ))
Optionally capture the S01 E22 part (where\d
matches 1 or more digits)
Another option with a capture group for the S01 E22 part, or else match the redt of the line
\b(Movie)\h ([^S\n]*(?:S(?!\d \h E\d \b)[^S\n]*)* )(S\d \h E\d )?
CodePudding user response:
With your shown samples and attempts please try following regex.
^Movie\s \S \s (S\d{2}\s E\d{2}(?=\s \d{4}))
Here is the Online Demo for used regex.
Explanation: Adding detailed explanation for used regex above.
^Movie\s \S \s ##Matching string Movie from starting of value followed by spaces non-spaces and spaces.
(S\d{2}\s E\d{2} ##Creating one and only capturing group where matching:
##S followed by 2 digits followed by spaces followed by E and 2 digits.
(?=\s \d{4}) ##Making sure by positive lookahead that previous regex is followed by spaces and 4 digits.
) ##Closing capturing group here.
CodePudding user response:
An idea to make the dot lazy .*?
and force it to match up to $
end if other part doesn't exist.
Movie\s*(.*?)\s*(S\d\d\s*E\d\d|$)
See this demo at regex101 (further I added some \s*
spaces around captures)
CodePudding user response:
There are several errors in your regex:
- Blank space after
Movie
is not considered. (.*)
matches everything afterMovie
.
Try online at https://regex101.com/
(Movie\s*)(\w*\s*)(S\d{2}\s*E\d{2}\s*)?((?:\w*\s*)*)