I am using python3 regex to treat text with <span> tags. The purpose is to remove all <span> tags. According to regex document, re.sub has four arguments, count=0 means replace all.
The sample code is here:
import re
text = "\n<span><div>\n<span>Test string</span>\n</div></span>\n"
patten = re.compile('(.*)(<span .*?>|<span>)(.*?)</span>(.*)',re.IGNORECASE|re.MULTILINE|re.DOTALL)
text1=patten.sub(r'\1\n\3\n\4', text)
print("before:" text "\n" "after:" text1)
The output is here:
before:
<span><div>
<span>Test string</span>
</div></span>
after:
<span><div>
Test string
</div></span>
The input string has two <span> tags, the output is expected no <span> tag. The code result is only removed one and still remained one. What's wrong of my code? Thanks very much.
Qian
CodePudding user response:
Hope its help you.
import re
text = "\n<span><div>\n<span>Test string</span>\n</div></span>\n"
patten = re.compile(r'</?span[^>]*>',re.IGNORECASE|re.MULTILINE|re.DOTALL)
text1=re.sub(r'</?span[^>]*>', '', text)
print("before:" text "\n" "after:" text1)