I'm working with a very nested XML file and the path is critical for understanding. This answer enables me to print both the path and value: Python xml absolute path
What I can't figure out is how to output the result in a more usable way (trying to construct a dataframe listing Path and Value).
For example, from the linked example:
<A>
<B>foo</B>
<C>
<D>On</D>
</C>
<E>Auto</E>
<F>
<G>
<H>shoo</H>
<I>Off</I>
</G>
</F>
</A>
from lxml import etree
root = etree.XML(your_xml_string)
def print_path_of_elems(elem, elem_path=""):
for child in elem:
if not child.getchildren() and child.text:
# leaf node with text => print
print "%s/%s, %s" % (elem_path, child.tag, child.text)
else:
# node with child elements => recurse
print_path_of_elems(child, "%s/%s" % (elem_path, child.tag))
print_path_of_elems(root, root.tag)
Results in the following printout:
/A/B, foo
/A/C/D, On
/A/E, Auto
/A/F/G/H, shoo
/A/F/G/I, Off
I believe yield is the correct technique but I'm getting no where, current attempt returns nothing:
from lxml import etree
root = etree.XML(your_xml_string)
def yield_path_of_elems(elem, elem_path=""):
for child in elem:
if not child.getchildren() and child.text:
ylddict = {'Path':elem_path, 'Value':child.text}
yield(ylddict)
else:
# node with child elements => recurse
yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))
for i in yield_path_of_elems(root):
#print for simplicity in example, otherwise turn into DF and concat
print(i)
From experimenting I believe when I use yield or return the recursion doesn't function correctly.
CodePudding user response:
You need to pass the values yielded by the recursive call back to the original caller. So change:
yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))
to
yield from yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))
This is analogous to the way you have to use return recursive_call(...)
in a normal recursive function.