String to match:
{abc}
Strings to not match:
$${abc{abc}{abc}}$$
How do I satisfy this requirement with regex?
The context is trying to match {abc}
elements for replacement with Python, but I don't want them mixed up with MathJax equations $${abc{abc}{abc}}$$
in a HTML
file.
I understand [^$]{. }
will work somewhat for strings such as "$${}$$"
, but not others with nested brackets (with content inside nested brackets) such as "$${{abc}}$$"
. Which shouldn't be matched.
Sample:
Match:
{element 1}
{element 2}
{element_abc}
Don't match:
$${abc{abc}{abc}}$$
$${mathjax{}{mathjax}}$$
$${mathjax{}{}{}{}{{{mathjax}}}}$$
With matches of:
{element 1}
{element 2}
{element_abc}
The search doesn't need to scan recursively for intermixed elements:
{$${}$$}
can match (not possible in my actual text, so a match can be made if necessary)- A line of a
{abc}
and a$${abc}$$
such as{abc} abc $${abc}$$
may be possible
Using regex 2021.11.10
via pip
CodePudding user response:
If you simply need to match the expressions on individual lines, all you need is to add line anchors.
^\{[^{}] \}$
If your input is a single string with multiple lines in it, you'll need to add the re.MULTILINE
flag to say that ^
and $
should match at internal newlines, too.
>>> import re
>>> re.findall(r'^\{[^{}] \}$', '''
... {foo}
... $${bar{baz}}
... {quux}
... ick
... ''', re.MULTILINE)
['{foo}', '{quux}']
This is portable back to the standard Python re
module, too.
CodePudding user response:
With PyPi regex library, you can use a SKIP-FAIL recursion-based regex like
\$\$({(?:[^{}] |(?1))*})\$\$(*SKIP)(*F)|{([^{}]*)}
See the regex demo. Details:
\$\$({(?:[^{}] |(?1))*})\$\$(*SKIP)(*F)
:\$\$
- a$$
string({(?:[^{}] |(?1))*})
- Group 1: a{
, then any zero or more occurrences of any one or more chars other than{
and}}
or the same Group 1 pattern recursed, and then}
\$\$
- a$$
string(*SKIP)(*F)
- "forget" the text matched up to this moment
|
- or{([^{}]*)}
-{
, then Group 2 capturing any zero or more chars other than{
and}
, and then a}
.
In Python, you can use
import regex
text = '{element 1} {element 2} {element_abc} $${abc{abc}{abc}}$$ $${mathjax{}{mathjax}}$$ $${mathjax{}{}{}{}{{{mathjax}}}}$$'
pattern = regex.compile( r'\$\$({(?:[^{}] |(?1))*})\$\$(*SKIP)(*F)|{([^{}]*)}' )
print( [match.group() for match in pattern.finditer(text)] )
# => ['{element 1}', '{element 2}', '{element_abc}']
print( [match.group(2) for match in pattern.finditer(text)] )
# => ['element 1', 'element 2', 'element_abc']
See this online demo.