I have this string:
row = <line><a>111</a><b>222</b><c>333</c></line><line><a>444</a><b></b><c>555</c></line>
If <b>
has no value I need all the line to be deleted from my string in this case:
<line><a>444</a><b></b><c>555</c></line>
Should I split my row to an array and check each line than concat it to a new string without the lines that hold an empty b ? Is there an easy/ smart way?
CodePudding user response:
Using re.sub
we can try:
row = '<line><a>111</a><b>222</b><c>333</c></line><line><a>444</a><b></b><c>555</c></line>'
output = re.sub(r'<line>(?:(?!<line>).)*<b></b>.*?</line>', '', row)
print(output) # <line><a>111</a><b>222</b><c>333</c></line>
The (?:(?!<line>).)
term in the regex used above will match content without crossing over the ending tag </line>
, and is often referred to as "tempered dot." The idea is that we want to match an empty <b>
tag within <line>
.
Note that for a more general solution consider using an XML parser. Regex happens to work here assuming that all tags below <line>
are single level and not nested deeply.