I have a page like this (a speech or a dialogue page organised like this, so speaker name in bold and then paragraphs of his speech):
<body>
<p>
<b>
speaker abc:
</b>
some wanted text here
</p>
<p>
some other text wanted, maybe containing speaker abc
</p>
<p>
some other text wanted, maybe containing speaker cde
</p>
<p>
some other text wanted
</p>
<p>
<b>
speaker cde (can be random):
</b>
</p>
<p>
some other text UNwanted, maybe containing speaker abc
</p>
<p>
some other text UNwanted, maybe containing speaker cde
</p>
<p>
some other text UNwanted
</p>
<p>
<b>
speaker abc:
</b>
</p>
<p>
some other text wanted
</p>
<p>
<b>
speaker fgh:
</b>
</p>
<p>
some other text UNwanted
</p>
</body>
I would like to select (using xpath) all text elements marked as wanted text in example (all phrases spoken by one particular speaker, say abc).
I am not very fluent with xpath and html, I suspect there should be some usage of axis but struggle to figure out how.
CodePudding user response:
The following XPath will do this:
"//*[preceding-sibling::p[contains(.,'speaker abc')] and following-sibling::p[contains(.,'speaker cde')]]"
We are limiting the wanted p
nodes by preceding-sibling
p
node containing the wanted text speaker name in front and by following-sibling
p
node containing the next, unwanted speaker name on the end.
the output is
some other text wanted, maybe containing abc
some other text wanted, maybe containing cde
some other text wanted
CodePudding user response:
This is very difficult to do using XPath 1.0 alone.
In XSLT 2.0 , use positional grouping:
<xsl:for-each-group select="p" group-starting-with="p[b]">...</
and then select the groups you are interested in.
If you have to do it using XPath 1.0, consider pre-processing the input using XSLT to split the text into speeches, using xsl:for-each-group as suggested.