For example, for a page with source code
<h1 id='page-title'>This is a page</h1>
<div class='page-part'>
<button id='red-button' style='background-color:Red'>I'm a button</button>
<button id='blue-button' style='background-color:Blue'>I'm a button</button>
</div>
I want to get
<h1>This is a page</h1>
<div>
<button>I'm a button</button>
<button>I'm a button</button>
</div>
How can I do this?
CodePudding user response:
Then let's do it in python, using ElementTree and xpath:
import xml.etree.ElementTree as ET
#I changed your html a bit, to make sure the code works
mu = """
<html>
<h1 id='page-title'>This is a page</h1>
<div class='page-part'>
<button id='red-button' style='background-color:Red'>I'm a button</button>
<button id='blue-button' value = "yo" style='background-color:Blue'>I'm also a button</button>
</div>
<div>I have no attributes</div>
</html>
"""
doc1 = ET.fromstring(mu)
to_del = [] #initialize a list of attributes to delete
for elem in doc1.findall('.//*'): #get all elements in the html
#get all attribute names and add them to the list
to_del.extend(list(elem.attrib.keys()))
#once the attribute list is ready, eliminate duplicates, iterate
#over the list and find all elements which have the particular attribute
for td in set(to_del):
for elem in doc1.findall('.//*'):
#delete the attribute
elem.attrib.pop(td, None)
print(ET.tostring(doc1).decode())
Output:
<html>
<h1>This is a page</h1>
<div>
<button>I'm a button</button>
<button>I'm also a button</button>
</div>
<div>I have no attributes</div>
</html>