Home > OS >  lxml insert new node in tree, with parent's contents inside
lxml insert new node in tree, with parent's contents inside

Time:11-17

i have this tree :

<TEI>
<teiHeader/>
<text>
<body>
<div type="chapter">
<p rend="b"><pb n="1"/>lorem ipsum...</p>
<p rend="b">lorem pb n="2"/> ipsum2...</p>
<p>lorem ipsum3...</p>
</div>
<div type="chapter">
<p>lorem ipsum4...</p>
<p rend="b">lorem ipsum5...</p>
<p rend="b">pb n="3"/> lorem ipsum6...</p>
</div>
</body>
</text>
</TEI>

and i would like to change all

<p rend="b">lorem ipsum...</p>

into

<p><hi rend="b">lorem ipsum...</hi></p>

problem is : all <pb n="X"/> tags are removed.

i tried this (root = xml tree above) :

parser = etree.XMLParser(ns_clean=True, remove_blank_text=True)
root = etree.fromstring(root, parser)
for item in root.findall(".//p[@rend='b']"):
    hi = etree.SubElement(item, "hi", rend=font_variant[variant])
    hi.text = ''.join(item.itertext())
print(etree.tostring(root, pretty_print=True, xml_declaration=True))

and i get, for instance for the first <p/> :

<p><pb n="1"/>lorem ipsum...<hi rend="b"> lorem ipsum...</hi></p>

the <pb n="1"/> is missing.

Could you help me out?

CodePudding user response:

If I understand you correctly,you are probably looking for something like this:

for p in root.xpath('//p[@rend="b"]'):
    #clone the old <p>
    old = etree.fromstring(etree.tostring(p))
    #change its name
    old.tag = "hi"
    #create a new element
    new = etree.fromstring('<p/>')    
    #append the clone to the new element
    new.append(old)
    new.tail ="\n"
    #delete the old <p> and replace it with the new element
    p.getparent().replace(p, new)
  • Related