Home > front end >  Returning value after recursively iterating through XML
Returning value after recursively iterating through XML

Time:11-29

I'm working with a very nested XML file and the path is critical for understanding. This answer enables me to print both the path and value: Python xml absolute path

What I can't figure out is how to output the result in a more usable way (trying to construct a dataframe listing Path and Value).

For example, from the linked example:

<A>
  <B>foo</B>
  <C>
    <D>On</D>
  </C>
  <E>Auto</E>
  <F>
    <G>
      <H>shoo</H>
      <I>Off</I>
    </G>
  </F>
</A>

from lxml import etree
root = etree.XML(your_xml_string)

def print_path_of_elems(elem, elem_path=""):
    for child in elem:
        if not child.getchildren() and child.text:
            # leaf node with text => print
            print "%s/%s, %s" % (elem_path, child.tag, child.text)
        else:
            # node with child elements => recurse
            print_path_of_elems(child, "%s/%s" % (elem_path, child.tag))

print_path_of_elems(root, root.tag)

Results in the following printout:

/A/B, foo
/A/C/D, On
/A/E, Auto
/A/F/G/H, shoo
/A/F/G/I, Off

I believe yield is the correct technique but I'm getting no where, current attempt returns nothing:

from lxml import etree
root = etree.XML(your_xml_string)

def yield_path_of_elems(elem, elem_path=""):
    for child in elem:
        if not child.getchildren() and child.text:
            ylddict = {'Path':elem_path, 'Value':child.text}
            yield(ylddict)
        else:
            # node with child elements => recurse
            yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))

for i in yield_path_of_elems(root):
    #print for simplicity in example, otherwise turn into DF and concat
    print(i)

From experimenting I believe when I use yield or return the recursion doesn't function correctly.

CodePudding user response:

You need to pass the values yielded by the recursive call back to the original caller. So change:

yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))

to

yield from yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))

This is analogous to the way you have to use return recursive_call(...) in a normal recursive function.

  • Related