Home > other >  Error in substituting '(' with regex in Python
Error in substituting '(' with regex in Python

Time:05-21

Hi have the following string:

s = r'aaa (bbb (ccc)) ddd'

and I would like to find and replace the innermost nested parentheses with {}. Wanted output:

s = r'aaa (bbb {ccc}) ddd'

Let's start with the nested (. I use the following regex in order to find nested parentheses, which works pretty good:

match = re.search(r'\([^\)] (\()', s)
print(match.group(1))
(

Then I try to make the substitution:

re.sub(match.group(1), r'\{', s)

but I get the following error:

error: missing ), unterminated subpattern at position 0

I really don't understand what's wrong.

CodePudding user response:

You've gotten the argument order wrong:

sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the Match object and must return a replacement string to be used.

The pattern comes first, but because you've given it match.group(1), it's seeing '(' as the pattern, which contains unmatched and unescaped parentheses.

I think what you are after is something like:

re.sub(r'\([^\)] (\()', r'\1{', s)
'aaa ({ccc)) ddd'

CodePudding user response:

You can use

import re
s = r'aaa (bbb (ccc)) ddd'
print( re.sub(r'\(([^()]*)\)', r'{\1}', s) )
# => aaa (bbb {ccc}) ddd

See the Python demo.

Details:

  • \( - a ( char
  • ([^()]*) - Group 1 (\1): any zero or more chars other than ( and )
  • \) - a ) char.

The replacement is a Group 1 value wrapped with curly braces.

CodePudding user response:

With your shown samples and attempts, please try following code in Python, written and tested in Python3.x. Also here is the Online demo for used regex in code.

import re
var = r'aaa (bbb (ccc)) ddd'
print( re.sub(r'(^.*?\([^(]*)\(([^)]*)\)(.*)', r'\1{\2}\3', var) )

Output for shown samples, will be as follows:

aaa (bbb {ccc}) ddd

Explanation of Python code:

  • Using python's re library here for regex.
  • Creating a variable named var which has value aaa (bbb (ccc)) ddd in it.
  • Then using print function of python3 to print value which we get from re.sub function which is performing substitution for us to get required output.

Explanation of re.sub section: Basically we are using regex (^.*?\([^(]*)\(([^)]*)\)(.*)(explained below) which creates 3 capturing groups(only to get required values), where 1st capturing group captures value just before ( which is present before ccc and 2nd capturing group has ccc in it and 3rd capturing group has rest of the value in it. While performing substitution we are simply substituting it with \1{\2}\3 and wrapping value ccc within {..}

Explanation of regex:

(^.*?\([^(]*)  ##Creating 1st capturing group which matches values from starting of value to till first occurrence of ( 
               ##with a Lazy match followed by a match which matches anything just before next occurrence of (
\(             ##Matching literal ( here, NO capturing group here as we DO NOT want this in output.
([^)]*)        ##Creating 2nd capturing group which has everything just before next occurrence of ) in it.
\)             ##Matching literal ) here, NO capturing group here as we DO NOT want this in output.
(.*)           ##Creating 3rd capturing group which has rest values in it.
  • Related