I want to remove the parentheses from strings containing table or figure references. The parentheses can include multiple references. The pattern is somewhat consequent. Here are some examples:
text = [
"this is a figure ref (figure 7\xe2\x80\x9377)",
"this is multiple refs (figures 6\xe2\x80\x9328 and 6\xe2\x80\x9329)",
"this is a table ref (table 6\xe2\x80\x931)"
]
I'm using the following regex:
text = re.sub(r"\(([\w]\s\d(\\[a-z] [0-9]) )\)", " ", text)
CodePudding user response:
You can remove any parentheses that start with table
or figure
:
re.sub(r'\s*\(\s*(?:table|figure)[^()]*\)', '', text)
See the regex demo. Details:
\s*\(\s*
-(
enclosed with zero or more whitespaces on both ends(?:table|figure)
-table
orfigure
string[^()]*
- zero or more chars other than(
and)
\)
- a)
char.
See the Python demo:
import re
text = [
"this is a figure ref (figure 7\xe2\x80\x9377)",
"this is multiple refs (figures 6\xe2\x80\x9328 and 6\xe2\x80\x9329)",
"this is a table ref (table 6\xe2\x80\x931)"
]
text = [re.sub(r'\s*\(\s*(?:table|figure)[^()]*\)', '', t) for t in text]
print(text)
# => ['this is a figure ref', 'this is multiple refs', 'this is a table ref']