Home > OS >  How to match XML element with XPath if condition applies in both preceding and following elements
How to match XML element with XPath if condition applies in both preceding and following elements

Time:12-27

I have the following XML:

<root>
  <tok lemma="per" xpos="SPS00">per</tok>
  <tok lemma="els" xpos="L3CP0">los</tok>
  <tok lemma="qual" xpos="PR0CP000">quals</tok>
  <tok lemma="ser" xpos="VSIP3P0">són</tok>
  <tok lemma="digne" xpos="AQ0CP00">dignes</tok>
  <tok lemma="de" xpos="SPC00">de</tok>
  <tok lemma="gloriós" xpos="AQ0FS00">gloriosa</tok>
  <tok lemma="memòria" xpos="NCFS000">memòria</tok>
  <tok xpos="CC" lemma="i">e</tok>
  <tok lemma="li" xpos="L3CSD" >li</tok>
  <tok lemma="plàcia" xpos="VMSP3S0">plàcia</tok>
  <tok lemma="molt" xpos="RG">molt</tok>
  <tok lemma="per" xpos="SPS00">per</tok>
</root>

I'm trying to use this XPath:

//tok[starts-with(@xpos, "L")]/preceding-sibling::tok[1][not(starts-with(@xpos, "V"))]/following-sibling::tok[1][not(starts-with(@xpos, "V"))]

to capture only the middle element in this sequence of elements:

  <tok lemma="per" xpos="SPS00">per</tok>
  <tok lemma="els" xpos="L3CP0">los</tok>
  <tok lemma="qual" xpos="PR0CP000">quals</tok>

My thinking was that this XPath requires for the condition to be met both in the preceding and the following element. I'm obviously wrong because it looks like there is a match as long as the condition applies in the preceding sibling element even if it doesn't apply in the following element. Right now my Xpath yields:

<tok lemma="els" xpos="L3CP0">los</tok>
<tok lemma="li" xpos="L3CSD">li</tok>

My goal is that the second one would be excluded because the following element has an attribute named 'xpos' with a value that starts with 'V'(you can see that in the sample XML).

What am I doing wrong? By now, I thought I had gotten the gist of XPath syntax. How does one specify in the XPath that the condition on the attribute value has to be met in the element immediately preceding it and in the one following it?

CodePudding user response:

Correct one regarding your conditions would be

//tok[starts-with(@xpos, "L") 
      and preceding-sibling::tok[1][not(starts-with(@xpos, "V"))] 
      and following-sibling::tok[1][not(starts-with(@xpos, "V"))]]

Your approach won't work because your XPath means:

  • find a node with attribute starting with "L"
  • select its preceding sibling if attribute doesn't start with "V"
  • select following sibling (the same node selected on step#1) if attribute doesn't start with "V" (obviously, it won't start with "V" since as we found out it starts with "L")
  • Related