Home > Enterprise >  REGEX Python Second Match
REGEX Python Second Match

Time:01-18

I am trying to extract the second match to "LOCATION \s \S " from the following text:

 PAGE    1
​
                BID OPENING DATE    07/25/18    FROM 0.2 MILES WEST OF ICE HOUSE        07/26/18 CONTRACT NUMBER    03-2F1304   ROAD TO 0.015 MILES WEST OF CONTRACT CODE 'A '
​
            LOCATION    03-ED-50-39.5/48.7  DIVISION HIGHWAY ROAD   44 CONTRACT ITEMS
​
        INSTALL SANDTRAPS AND PULLOUTS  FEDERAL AID ACNH-P050-(146)E
​
PAGE    1
​
                    BID OPENING DATE    07/25/18    IN EL DORADO COUNTY AT VARIOUS          07/26/18 CONTRACT NUMBER     03-2H6804  LOCATIONS ALONG ROUTES 49 AND 193   CONTRACT CODE 'C ' LOCATION 03-ED-0999-VAR          13 CONTRACT ITEMS
​
​
​
        TREE REMOVAL    FEDERAL AID NONE
​
PAGE    1
​
                BID OPENING DATE    07/25/18    IN LOS ANGELES, INGLEWOOD AND       07/26/18 CONTRACT NUMBER    07-296304   CULVER CITY, FROM I-105 TO PORT CONTRACT CODE 'B '
​
            LOCATION    07-LA-405-R21.5/26.3    ROAD UNDERCROSSING  55 CONTRACT ITEMS
​
​
​
        ROADWAY SAFETY IMPROVEMENT  FEDERAL AID ACIM-405-3(056)E

I am trying to get LOCATION 03-ED-0999-VAR (second match) from the text. Is there a way to specify that we want the second or the third or the nth match in python? Right now, I have the following code:

# imports
import os
import pandas as pd
import re
import docx2txt
import textract
import antiword

text = ' PAGE    1

                BID OPENING DATE    07/25/18    FROM 0.2 MILES WEST OF ICE HOUSE        07/26/18 CONTRACT NUMBER    03-2F1304   ROAD TO 0.015 MILES WEST OF CONTRACT CODE 'A '

            LOCATION    03-ED-50-39.5/48.7  DIVISION HIGHWAY ROAD   44 CONTRACT ITEMS

        INSTALL SANDTRAPS AND PULLOUTS  FEDERAL AID ACNH-P050-(146)E

PAGE    1

                    BID OPENING DATE    07/25/18    IN EL DORADO COUNTY AT VARIOUS          07/26/18 CONTRACT NUMBER     03-2H6804  LOCATIONS ALONG ROUTES 49 AND 193   CONTRACT CODE 'C ' LOCATION 03-ED-0999-VAR          13 CONTRACT ITEMS



        TREE REMOVAL    FEDERAL AID NONE

PAGE    1

                BID OPENING DATE    07/25/18    IN LOS ANGELES, INGLEWOOD AND       07/26/18 CONTRACT NUMBER    07-296304   CULVER CITY, FROM I-105 TO PORT CONTRACT CODE 'B '

            LOCATION    07-LA-405-R21.5/26.3    ROAD UNDERCROSSING  55 CONTRACT ITEMS



        ROADWAY SAFETY IMPROVEMENT  FEDERAL AID ACIM-405-3(056)E'

location1 = re.search('LOCATION \s \S ', text)

CodePudding user response:

Instead of using re.search() you could try using re.findall() instead. This will get you all the matches in form of a list and you could pick whichever you'd like and even count how many you got.

location1 = re.findall("LOCATION \s \S ", text)
print(len(location1)) # To print how many matches there are
print(location1[1]) # To print second match
  • Related