I have a bunch of sentences where additional white spaces are presented in every pair of brackets/parentheses/braces. Some of the brackets/parentheses/braces overlap with each other, which is giving me problems. e.g.:
[in]: sentence = '{ ia } ( { fascia } antebrachii ). Genom att aponeurosen fäster i armb'
[in]: pattern = r'(\s([?,.!"]))|(?<=\{|\[|\()(.*?)(?=\)|\]|\})'
[in]: re.sub(pattern, lambda x: x.group().strip(), sentence)
[out]: '{ia} ({ fascia} antebrachii ). Genom att aponeurosen fäster i armb'
As shown here, I have failed to remove the unnecessary white spaces in the overlapped brackets/parentheses/braces. How do I cover these overlapping or nested cases? Thanks.
Expected output:
'{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb'
CodePudding user response:
You can replace any space following an opening bracket, or preceding a closing bracket with this regex:
(?<=[\[{(])\s |\s (?=[\]})])
(?<=[\[{(])\s
- looks for spaces preceded by one of [{(
\s (?=[\]})])
- looks for spaces followed by one of ]})
In python
sentence = '{ ia } ( { fascia } antebrachii ). Genom att aponeurosen fäster i armb'
re.sub(r'\s (?<=[\[{(])|\s (?=[\]})])', '', sentence)
Output:
{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb
CodePudding user response:
You may try this:
" (?=[^(]*\))| (?=[^{]*\})"
Explanation:
(?=[^(]*\)
positive look ahead to check if the the preceding space is inside a first bracket.|
or(?=[^{]*\})
positive look ahead to check if the the preceding space is inside a curly bracket.
Python source ( run here ):
regex = r" (?=[^(]*\))| (?=[^{]*\})"
test_str = ("{ ia } ( { fascia } antebrachii ). Genom att aponeurosen \n")
result = re.sub(regex,"", test_str)
if result:
print (result)