Home > Net >  How to remove a substring from a string according to data inside
How to remove a substring from a string according to data inside

Time:03-02

I have this string:

 row =   <line><a>111</a><b>222</b><c>333</c></line><line><a>444</a><b></b><c>555</c></line>

If <b> has no value I need all the line to be deleted from my string in this case:

  <line><a>444</a><b></b><c>555</c></line>

Should I split my row to an array and check each line than concat it to a new string without the lines that hold an empty b ? Is there an easy/ smart way?

CodePudding user response:

Using re.sub we can try:

row = '<line><a>111</a><b>222</b><c>333</c></line><line><a>444</a><b></b><c>555</c></line>'
output = re.sub(r'<line>(?:(?!<line>).)*<b></b>.*?</line>', '', row)
print(output)  # <line><a>111</a><b>222</b><c>333</c></line>

The (?:(?!<line>).) term in the regex used above will match content without crossing over the ending tag </line>, and is often referred to as "tempered dot." The idea is that we want to match an empty <b> tag within <line>.

Note that for a more general solution consider using an XML parser. Regex happens to work here assuming that all tags below <line> are single level and not nested deeply.

  • Related