Home > front end >  Python regex to retrieve multiple values with various given words between them
Python regex to retrieve multiple values with various given words between them

Time:06-30

Need to retrieve dimensions from the text where they can be specified in a couple of ways:

  • "... 10.5 inches x 5 feet x 2 inches ..."
  • "... 10 inches x 5 inches ..."
  • "... 10 inches x 5 inches ..."
  • "... 10 x 5 inches ..."

There can be two or three dimensions and each might have its own measurement type or not have it.

I am struggling to add the list of optional dimension types and make the search of the third vale optional in the regex:

dimensions = re.findall(r'(\d \.?\d*)\s*inches?feet?\s*x\s*(\d \.?\d*)\s*inches?feet?\s*x?\s*(\d \.?\d*)?\s*inches?feet?',string)

CodePudding user response:

What you have is inches?feet?, which says "match 0 to 1 'inches' and 0 to 1 'feet'". This means it could match something like "5 inchesfeet".

You were fairly close. The key idea you missed is that | can be used to specify alternatives to match: (?:inches|feet)?. They're put in a non-capturing group to clarify that only "feet" should be part of the alternative and not everything after it. The ? at the end makes the entire group optional.

To make the entire third dimension optional, the pattern for it can be put in a non-capturing group, and then that group can be made optional with ?:

(?:x\s*(\d \.?\d*)?\s*(?:inches|feet)?)?

The final regex is

(\d \.?\d*)\s*(?:inches|feet)?\s*x\s*(\d \.?\d*)\s*(?:inches|feet)?\s*(?:x\s*(\d \.?\d*)?\s*(?:inches|feet)?)?

CodePudding user response:

Here is one re.findall approach which is working:

inp = """... 10 inches x 5 feet x 2 inches ...
... 10 inches x 5 inches ..."
... 10 inches x 5 inches ..."
... 10 x 5 inches ..."""

dims = re.findall(r'\d (?:\.\d )?(?: (?!x\b)\w )?(?: x \d (?:\.\d )?(?: (?!x\b)\w )?)*', inp)
print(dims)

This prints:

['10 inches x 5 feet x 2 inches',
 '10 inches x 5 inches',
 '10 inches x 5 inches',
 '10 x 5 inches']

Here is an explanation of the regex pattern being used:

\d (?:\.\d )?           match a number
(?:
    [ ]                 followed by space
    (?!x\b)             NOT followed by 'x'
    \w                  but is followed by any other dimension
)?                      optional
(?:
    [ ]                 space
    x                   'x'
    [ ]                 space
    \d (?:\.\d )?       another number
    (?: (?!x\b)\w )?    zero or more other numbers/dimensions
)*                      optional
  • Related