Home > Back-end >  Regex to remove parentheses containing figure or table references
Regex to remove parentheses containing figure or table references

Time:11-11

I want to remove the parentheses from strings containing table or figure references. The parentheses can include multiple references. The pattern is somewhat consequent. Here are some examples:

text = [
"this is a figure ref (figure 7\xe2\x80\x9377)",
"this is multiple refs (figures 6\xe2\x80\x9328 and 6\xe2\x80\x9329)",
"this is a table ref (table 6\xe2\x80\x931)"
]

I'm using the following regex:

text = re.sub(r"\(([\w]\s\d(\\[a-z] [0-9]) )\)", " ", text)

CodePudding user response:

You can remove any parentheses that start with table or figure:

re.sub(r'\s*\(\s*(?:table|figure)[^()]*\)', '', text)

See the regex demo. Details:

  • \s*\(\s* - ( enclosed with zero or more whitespaces on both ends
  • (?:table|figure) - table or figure string
  • [^()]* - zero or more chars other than ( and )
  • \) - a ) char.

See the Python demo:

import re
text = [
"this is a figure ref (figure 7\xe2\x80\x9377)",
"this is multiple refs (figures 6\xe2\x80\x9328 and 6\xe2\x80\x9329)",
"this is a table ref (table 6\xe2\x80\x931)"
]
text = [re.sub(r'\s*\(\s*(?:table|figure)[^()]*\)', '', t) for t in text]
print(text)
# => ['this is a figure ref', 'this is multiple refs', 'this is a table ref']
  • Related