Here is my code so far that i'm having trouble matching text before "v." and some other terms.
(?s)v\..*?(?:\d\d\d\d\))
Sample Text:
See Holiday in v. Marriot (2002)
e.g. FB v. Google (2012)
; Yahoo! v. Microsoft (2000)"
I need to be about to grab:
Holiday in v. Marriot (2002)
FB v. Google (2012)
Yahoo! v. Microsoft (2000)
CodePudding user response:
If there are only single uppercase words, you could start with an uppercase char followed by dots or other uppercase chars and then any char except uppercase chars till v.
(?s)\b[A-Z][A-Z.]*[^A-Z]*v\..*?\(\d{4}\)
Another option could be specifying the possible leading chars using an alternation |
with a capture group:
(?s)(?:\bSee\b|\be\.g\.|;)\s*(.*?\s v\..*?\(\d{4}\))
(?s)
Inline modifier to have the dot match a newline(?:\bSee\b|\be\.g\.|;)
Match one of the alternatives\s*
Match optional whitespace chars(
Capture group 1.*?\s v\.
Match as least as possible chars and thenv.
.*?\(\d{4}\)
Match as least as possible chars and then 4 digits between parenthesis
)
Close group 1
CodePudding user response:
Use
See\s (.*)\s \S \s (.*)\s ;\s (.*)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
See 'See'
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \3