Home > Net >  Remove unnecessary white spaces in brackets and braces
Remove unnecessary white spaces in brackets and braces

Time:04-13

I have a bunch of sentences where additional white spaces are presented in every pair of brackets/parentheses/braces. Some of the brackets/parentheses/braces overlap with each other, which is giving me problems. e.g.:

[in]: sentence = '{ ia } ( { fascia } antebrachii ). Genom att aponeurosen fäster i armb'
[in]: pattern = r'(\s([?,.!"]))|(?<=\{|\[|\()(.*?)(?=\)|\]|\})'
[in]: re.sub(pattern, lambda x: x.group().strip(), sentence)
[out]: '{ia} ({ fascia} antebrachii ). Genom att aponeurosen fäster i armb'

As shown here, I have failed to remove the unnecessary white spaces in the overlapped brackets/parentheses/braces. How do I cover these overlapping or nested cases? Thanks.

Expected output:
'{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb'

CodePudding user response:

You can replace any space following an opening bracket, or preceding a closing bracket with this regex:

(?<=[\[{(])\s |\s (?=[\]})])

(?<=[\[{(])\s - looks for spaces preceded by one of [{(

\s (?=[\]})]) - looks for spaces followed by one of ]})

In python

sentence = '{ ia } ( { fascia } antebrachii ). Genom att aponeurosen fäster i armb'
re.sub(r'\s (?<=[\[{(])|\s (?=[\]})])', '', sentence)

Output:

{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb

CodePudding user response:

You may try this:

"  (?=[^(]*\))|  (?=[^{]*\})"

Explanation:

  1. look for one or more spaces
  2. (?=[^(]*\) positive look ahead to check if the the preceding space is inside a first bracket.
  3. | or
  4. look for one or more spaces
  5. (?=[^{]*\}) positive look ahead to check if the the preceding space is inside a curly bracket.

Demo

Python source ( run here ):

regex = r"  (?=[^(]*\))|  (?=[^{]*\})"
test_str = ("{ ia } ( { fascia  } antebrachii ). Genom att aponeurosen \n")
result = re.sub(regex,"", test_str)
if result:
    print (result)
  • Related