Home > database >  Parse letters separated by "x" using regex
Parse letters separated by "x" using regex

Time:03-29

I am looking to parse patterns like "w x h x l" using regex, so basically the letters w,h,l (and others) with "x" in between. There could be text around the searched expression, and "w x h x l x l x h" would be valid as well.

I have tried the regular expression
(w|h|l|b)(\\s\*x\\s\*(w|h|l|b))
but I don't understand why this doesn't work.

Examples (with python's re.findall):
"The measurements are (w x h x l): 5x7x3cm" => [(w,h,l)]
"Measurement options are (wxhxl), (hxlxb): Some random stuff" => [(w,h,l),(h,l,b)]
"The measurements, in form wxhxl: 5x7x3cm" => [(w,h,l)]

CodePudding user response:

You can use your pattern with non-capturing groups to extract all matches, and then split each match with x to get the separate chars:

import re

texts = [
    "The measurements are (w x h x l): 5x7x3cm", # => [(w,h,l)]
    "Measurement options are (wxhxl), (hxlxb): Some random stuff", # => [(w,h,l),(h,l,b)]
    "The measurements, in form wxhxl: 5x7x3cm" # => [(w,h,l)] 
]
for text in texts:
    print( [tuple(''.join(x.split()).split('x')) for x in re.findall(r'\b[whlb](?:\s*x\s*[whlb]) \b', text)] )

See the Python demo. Output:

[('w', 'h', 'l')]
[('w', 'h', 'l'), ('h', 'l', 'b')]
[('w', 'h', 'l')]

The \b[whlb](?:\s*x\s*[whlb]) \b pattern matches

  • \b - word boundary
  • [whlb] - a w, h, l or b char
  • (?:\s*x\s*[whlb]) - one or more repetitions of an x enclosed with zero or more whitespaces and then a w, h, l or b char
  • \b - word boundary

CodePudding user response:

if you can make use of the PyPi regex module you can use the group captures and a named capture group:

\b(?<pat>[whlb])(?:\s*x\s*(?<pat>[whlb])) \b 
  • \b A word boundary to prevent a partial word match
  • (?<pat>[whlb]) Group pat match one of w h l b
  • (?: Non capture group to repeat as a whole
    • \s*x\s*(?<pat>[whlb]) Match an x between optional whitespace chars and again named capture group pat
  • ) Close the non capture group and repeat it 1 times to match at least a single x
  • \b A word boundary

See a regex demo for the capture group values and a Python demo.

import regex

pattern = r'(?<pat>[whlb])(?:\s*x\s*(?<pat>[whlb])) '
s = ("The measurements are (w x h x l): 5x7x3cm\n"
            "The measurements, in form wxhxl: 5x7x3cm\n"
            "Measurement options are (wxhxl), (hxlxb): Some random stuff\n"
            "w x h x l x l x h")

for m in regex.finditer(pattern, s):
    print(tuple(m.captures("pat")))

Output

('w', 'h', 'l')
('w', 'h', 'l')
('w', 'h', 'l')
('h', 'l', 'b')
('w', 'h', 'l', 'l', 'h')
  • Related