Home > database >  XML: Print previous Element of the findall() function
XML: Print previous Element of the findall() function

Time:05-04

I'm working with an xml corpus that looks like this:

<corpus>
  <dialogue speaker="A">
    <sentence tag1="attribute1" tag2="attribute2"> Hello </sentence>
  </dialogue>
  <dialogue speaker="B">
    <sentence tag1="different_attribute1" tag2= "different_attribute2"> How are you </sentence>
  </dialogue>
</corpus>

I use root.findall() to search for all instances of "different_attribute2", but then I would like to print not only the parent element that contains the attribute but also the element that comes before that:

{'speaker': 'A'}
Hello
{'speaker':'B'}
How are you

I'm quite new at coding, so I've tried a bunch of for loops and if statements without result. I start with:

for words in root.findall('.//sentence[@tag2="different_attribute2"]'):
    for speaker in root.findall('.//sentence[@tag2="different_attribute2"]...'):
        print(speaker.attrib)
        print(words.text)

But then I have absolutely no idea on how to retrieve Speaker A. Can anyone help me?

CodePudding user response:

Using lxml and with a single xpath to find all elements:

>>> from lxml import etree
>>> tree = etree.parse('/home/lmc/tmp/test.xml')
>>> for e in tree.xpath('//sentence[@tag2="different_attribute2"]/parent::dialogue/@speaker | //sentence[@tag2="different_attribute2"]/text() | //dialogue[following-sibling::dialogue/sentence[@tag2="different_attribute2"]]/sentence/text() | //dialogue[following-sibling::dialogue/sentence[@tag2="different_attribute2"]]/@speaker'):
...      print(e)
... 
A
 Hello 
B
 How are you 

Xpath details

Find speaker B
//sentence[@tag2="different_attribute2"]/parent::dialogue/@speaker

Find sentence of B
//sentence[@tag2="different_attribute2"]/text()

Find sentence of A given B
//dialogue[following-sibling::dialogue/sentence[@tag2="different_attribute2"]]/sentence/text()

Find speaker=A given B
//dialogue[following-sibling::dialogue/sentence[@tag2="different_attribute2"]]/@speaker'

  • Related