I would like to create a regex expression that matches all possible episode numbering formats from a tv show file format.
I currently have this regex which matches most but not all of the list of examples.
(?:(?<=e)|(?<=episode)|(?<=episode[\.\s]))(\d{1,2})|((?<=-)\d{1,2})
The one it does not match is when there are two episodes directly after another e0102
should match 01
and 02
.
You can find the regex example with test cases here
CodePudding user response:
As per your comment, I went by following assumptions:
- Episode numbers are never more than three digits long;
- Episode strings will therefor have either 1-3 digits or 4 or 6 when its meant to be a range of episodes;
- There is never an integer of 5 digits assuming the same padding would be used for both numbers in a range of episodes;
- This would mean that lenght of either 4 or 6 digits needs to be split evenly.
Therefor, try the following:
e(?:pisode)?\s*(\d{1,3}(?!\d)|\d\d\d??)(?:-?e?(\d{1,3}))?(?!\d)
Here is an online demo. You'll notice I added some more samples to showecase the above assumptions.
e(?:pisode)?\s*
- Match either 'e' or 'episode' with 0 trailing whitespace characters;(\d{1,3}(?!\d)|\d\d\d??)
- A 1st capture group to catch 1-3 digits if not followed by any other digit or two digits;(?:-?e?(\d{1,3}))?
- An optional non-capture group with a nested 2nd capture group looking for optional hyphen and literal 'e' with trailing digits (1-3);(?!\d)
- There is no trailing digit left.