Home > Software engineering >  Removing an arbitrary sequence of characters inside brackets from a string
Removing an arbitrary sequence of characters inside brackets from a string

Time:04-01

I want to remove some characters from a string which are inside square brackets, regardless of the type of character as well as the amount of characters inside the square brackets. However, the type of brackets and their order does not change. Lastly, I want to remove the square brackets as well.

For example:

my_string1 = 'this[123]'
my_string2 = 'is[7]'
my_string3 = 'my[i]'
my_string4 = 'example[jk]'

Desired output:

my_string1 = 'this'
my_string2 = 'is'
my_string3 = 'my'
my_string4 = 'example'

Using re.sub() does not work for me:

import re
my_string1 = 'this[112]'
print(re.sub("[[]|[]]", "", my_string1))

The best output I got:

'this112'

CodePudding user response:

Use the pattern \[. ?\], which matches a literal [, followed by one or more characters, followed by a literal ]. We use the non-greedy ? in case there are multiple sequences enclosed in brackets:

import re
my_string1 = 'this[123]'
my_string2 = 'is[7]'
my_string3 = 'my[i]'
my_string4 = 'example[jk]'

for s in [my_string1, my_string2, my_string3, my_string4]:
    print(re.sub(r'\[. ?\]', '', s))

This outputs:

this
is
my
example

CodePudding user response:

Assuming that the [] are not nested, then use the following regex ...

\[[^\]]*\]

... and replace matches with the empty string:

  1. \[ - Matches '['
  2. [^\]]* - Matches 0 or more characters that are not ']'.
  3. \] - Matches ']'.

The code:

import re

tests = [
    'this[123]',
    'is[7]',
    'my[i]',
    'example[jk]',
]

for test in tests:
    test = re.sub(r'\[[^\]]*\]', '', test)
    print(test)

Prints:

this
is
my
example
  • Related