I have a string like this:
str1='x[cm],z,y[km]'
I want to find the units available in the above variable.
If I use re.findall(r"\[([A-Za-z0-9_/*^\\(\\)-\.] )\]", str1)
then it gives ['cm', 'km']
, but I want the output to be ['cm', '', 'km']
since z
has no unit associated. How can I achieve this?
Similaly for input string T(x[g],y,z[m])[kg]
the output should be ['g','','m','kg']
CodePudding user response:
You can split the string and the problem gets easier. To extract the unit from each substring you can write:
def extract_unit(s: str) -> str:
match = re.search(r"\[(.*?)\]", s)
return s[match.start() 1: match.end() - 1] if match else ""
and to create the list you can add the following code:
l = [extract_unit(s) for s in str1.split(',')]
CodePudding user response:
You can use this regex:
((?<!\])|(?<=\[)[^\[\],]*)\]?(?:,|\)|$)
Explanation:
( # open capturing group
(?<!\]) # the match is not preceded by a closed squared bracket
# match an empty string
| # OR
(?<=\[) # the match is preceded by an open squared bracket
[^\[\],]* # match zero or more characters that are neither squared brackets nor commas
) # close capturing group
\]? # consume an optional closed squared bracket
(?:,|\)|$) # consume a comma or a closed parenthesis or match the end of the string
re.findall
will output the content of the capturing group.
CodePudding user response:
This fix is certainly not regex-expert. Still, reformat your input to add empty brackets to relevant fields. Then you can use a simple regex to catch what you want.
import re
str1 = 'T(x[g],y,z[m])[kg]'
str1 = ''.join([x if '[' in x else x '[ ]' for x in str1.split(',')])
print(re.findall(r'\[([\w\s] )\]', str1))
Output:
['g', ' ', 'm', 'kg']