Home > Back-end >  Extract units from string using regex in python
Extract units from string using regex in python

Time:02-20

I have a string like this:

str1='x[cm],z,y[km]'

I want to find the units available in the above variable. If I use re.findall(r"\[([A-Za-z0-9_/*^\\(\\)-\.] )\]", str1) then it gives ['cm', 'km'], but I want the output to be ['cm', '', 'km'] since z has no unit associated. How can I achieve this?

Similaly for input string T(x[g],y,z[m])[kg] the output should be ['g','','m','kg']

CodePudding user response:

You can split the string and the problem gets easier. To extract the unit from each substring you can write:

def extract_unit(s: str) -> str:
    match = re.search(r"\[(.*?)\]", s)
    return s[match.start()   1: match.end() - 1] if match else ""

and to create the list you can add the following code:

l = [extract_unit(s) for s in str1.split(',')]

CodePudding user response:

You can use this regex:

((?<!\])|(?<=\[)[^\[\],]*)\]?(?:,|\)|$)

Explanation:

(             # open capturing group
  (?<!\])     #   the match is not preceded by a closed squared bracket
              #   match an empty string
|             # OR
  (?<=\[)     #   the match is preceded by an open squared bracket 
  [^\[\],]*   #   match zero or more characters that are neither squared brackets nor commas
)             # close capturing group
\]?           # consume an optional closed squared bracket
(?:,|\)|$)    # consume a comma or a closed parenthesis or match the end of the string

re.findall will output the content of the capturing group.

CodePudding user response:

This fix is certainly not regex-expert. Still, reformat your input to add empty brackets to relevant fields. Then you can use a simple regex to catch what you want.

import re

str1 = 'T(x[g],y,z[m])[kg]'

str1 = ''.join([x if '[' in x else x   '[ ]' for x in str1.split(',')])

print(re.findall(r'\[([\w\s] )\]', str1))

Output:

['g', ' ', 'm', 'kg']
  • Related