Home > front end >  Python regex to retrieve multiple values with various given words between them
Python regex to retrieve multiple values with various given words between them


Need to retrieve dimensions from the text where they can be specified in a couple of ways:

  • "... 10.5 inches x 5 feet x 2 inches ..."
  • "... 10 inches x 5 inches ..."
  • "... 10 inches x 5 inches ..."
  • "... 10 x 5 inches ..."

There can be two or three dimensions and each might have its own measurement type or not have it.

I am struggling to add the list of optional dimension types and make the search of the third vale optional in the regex:

dimensions = re.findall(r'(\d \.?\d*)\s*inches?feet?\s*x\s*(\d \.?\d*)\s*inches?feet?\s*x?\s*(\d \.?\d*)?\s*inches?feet?',string)

CodePudding user response:

What you have is inches?feet?, which says "match 0 to 1 'inches' and 0 to 1 'feet'". This means it could match something like "5 inchesfeet".

You were fairly close. The key idea you missed is that | can be used to specify alternatives to match: (?:inches|feet)?. They're put in a non-capturing group to clarify that only "feet" should be part of the alternative and not everything after it. The ? at the end makes the entire group optional.

To make the entire third dimension optional, the pattern for it can be put in a non-capturing group, and then that group can be made optional with ?:

(?:x\s*(\d \.?\d*)?\s*(?:inches|feet)?)?

The final regex is

(\d \.?\d*)\s*(?:inches|feet)?\s*x\s*(\d \.?\d*)\s*(?:inches|feet)?\s*(?:x\s*(\d \.?\d*)?\s*(?:inches|feet)?)?

CodePudding user response:

Here is one re.findall approach which is working:

inp = """... 10 inches x 5 feet x 2 inches ...
... 10 inches x 5 inches ..."
... 10 inches x 5 inches ..."
... 10 x 5 inches ..."""

dims = re.findall(r'\d (?:\.\d )?(?: (?!x\b)\w )?(?: x \d (?:\.\d )?(?: (?!x\b)\w )?)*', inp)

This prints:

['10 inches x 5 feet x 2 inches',
 '10 inches x 5 inches',
 '10 inches x 5 inches',
 '10 x 5 inches']

Here is an explanation of the regex pattern being used:

\d (?:\.\d )?           match a number
    [ ]                 followed by space
    (?!x\b)             NOT followed by 'x'
    \w                  but is followed by any other dimension
)?                      optional
    [ ]                 space
    x                   'x'
    [ ]                 space
    \d (?:\.\d )?       another number
    (?: (?!x\b)\w )?    zero or more other numbers/dimensions
)*                      optional
  • Related