I'm trying to parse out the names from a bunch of semi-unpredictable strings. More specifically, I'm using ruby, but I don't think that should matter much. This is a contrived example but some example strings are:
Eagles vs Bears
NFL Matchup: Philadelphia Eagles VS Chicago Bears TUNE IN
NFL Matchup: Philadelphia Eagles VS Chicago Bears - TUNE IN
Philadelphia Eagles vs Chicago Bears - NFL Match
Phil.Eagles vs Chic.Bears
3agles vs B3ars
The regex I've come up with is
([0-9A-Z .]*) vs ([0-9A-Z .]*)(?:[ -:]*tune)?/i
but in the case of "NFL Matchup: Philadelphia Eagles VS Chicago Bears TUNE IN" I'm receiving Chicago Bears TUNE
as the second match. I'm trying to remove "tune in" so it's in it's own group.
I thought that by adding (?:[ -:]*tune)?
it would separate the ending portion of the expression the same way that having vs
in the middle was able to, but that doesnt seem to be the case. If I remove the ?
at the end, it matches correctly for the above example, but it no longer matches for Eagles vs Bears
If anyone could help me, I would greatly appreciate it if you could breakdown your regex piece by piece.
CodePudding user response:
You can capture the second group up to a -
, :
or tune
preceded with zero or more whitespaces or till end of the line while making the second group pattern lazy:
([\w .]*) vs ([\w .]*?)(?=\s*(?:[:-]|tune|$))
See the regex demo.
Details:
([\w .]*)
- Group 1: zero or more word, space or.
chars as many as possiblevs
- avs
string([\w .]*?)
- Group 2: zero or more word, space or.
chars as few as possible(?=\s*(?:[:-]|tune|$))
- a positive lookahead that requires the following pattern to appear immediately to the right of the current location:\s*
- zero or more whitespaces(?:[:-]|tune|$)
-:
or-
,tune
or end of a line.