Home > OS >  Python - Regex capturing a word right after a specfic word
Python - Regex capturing a word right after a specfic word

Time:01-06

I am trying to regex search location in a document. However, I am having trouble with capturing only the location part of the text.

For example, for the text: LOCATION 03-ED-50-39.5/48.7 DIVISION HIGHWAY ROAD 44 CONTRACT ITEMS, we would only want LOCATION 03-ED-50-39.5/48.7.

Currently, I have the following code:

LOCATION\s (\d )

We know that the location string starts with a digit and ends with a digit with no space. Is there a way to capture the entire word/string right next to the location? Any help would be much appreciated. Thanks!

CodePudding user response:

Like this using \K and GNU grep:

grep -oP '^LOCATION\s \K\S ' file

With Perl:

perl -lne 'print for /^LOCATION\s \K\S /' file

With Python (using positive look behind):

>>> import re
>>> s = 'LOCATION   03-ED-50-39.5/48.7  DIVISION HIGHWAY ROAD   44 CONTRACT ITEMS'
>>> pattern = '(?<=LOCATION\s{3})\S '
>>> matches = re.finditer(pattern, s)
>>> for match in matches:
...     print(match.group())
... 
03-ED-50-39.5/48.7

Output

03-ED-50-39.5O/48.7

The regular expression matches as follows:

Node Explanation
^ the beginning of the string
LOCATION 'LOCATION'
\s whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible))
\K resets the start of the match (what is Kept) as a shorter alternative to using a look-behind assertion: perlmonks look arounds and Support of K in regex
\S non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible))
  • Related