Home > Software engineering >  Parsing a XML child element back as a string
Parsing a XML child element back as a string

Time:04-24

I'm trying to parse a complex XML and xpath isn't behaving like I thought it would. Here's my sample xml:

<project>
    <samples>
        <sample>show my balance</sample>
        <sample>show me the <subsample value='USD'>money</subsample>today</sample>
    </samples>
</project>

Here's my python code:

from lxml import etree

somenode="<project><samples><sample>show my balance</sample><sample>show me the <subsample value='USD'>money</subsample>today</sample></samples></project>"

somenode_etree = etree.fromstring(somenode)

for x in somenode_etree.iterfind(".//sample"):
    print (etree.tostring(x))

I get the output:

b'<sample>show my balance</sample><sample>show me the <subsample value="USD">money</subsample>today</sample></samples></project>'
b'<sample>show me the <subsample value="USD">money</subsample>today</sample></samples></project>'

when I expected:

show my balance
show me the <subsample value="USD">money</subsample>today

What am I doing wrong?

CodePudding user response:

This XPath will get text and elements as expected

result = somenode_etree.xpath(".//sample/text() | .//sample/*")
result
['show my balance', 'show me the ', <Element subsample at 0x7f0516cfa288>, 'today']

Printing found nodes as OP requested

for x in somenode_etree.xpath(".//sample/text() | .//sample/*[node()]"):
    if type(x) == etree._Element:
        print(etree.tostring(x, method='xml').decode('UTF-8'))
    else:
        print(x)

Result

show my balance
show me the 
<subsample value="USD">money</subsample>today
today

The last text() node is appended to the previous element what seems to be a bug on etree.tostring() method!

Or

>>> for x in somenode_etree.xpath(".//sample/text() | .//sample/*"):
...     if type(x) == etree._Element:
...         print(x.text)
...     else:
...         print(x)
... 
show my balance
show me the 
money
today
  • Related