I am working with TEI documents with the following structure:
<body>
<pb n="1"/>
<head>text text <lb/>text text<lb/>text text <ref target="n1">
<hi rend="super">1</hi>
</ref>
</head>
<byline>text</byline>
<note type="bio" place="bottom">text text...</note>
<div>
<p>text, <title>text</title> text <title>text</title> text text text<title>Saved</title> text text text</p>
<q>text text text text<ref target="n2">
<hi rend="super">2</hi>
</ref></q>
<p>text <title>text</title> and <title>text, </title>
</p>
</div>
<pb n="2"/>
<div>
<p>
<title>text</title> text text text</p>
<p> text text text text <hi rend="italic">text,</hi> "text." text text text <hi rend="italic">text,</hi> text text text<ref target="n3">
<hi rend="super">3</hi>
</ref> text text text <hi rend="italic">text</hi> text text text <hi rend="italic">text</hi> text text text text<ref target="n4">
<hi rend="super">4</hi>
</ref> text text text text</p>
<p>text text text text <hi rend="italic">text text text text<ref target="n5">
<hi rend="super">5</hi>
</ref></hi> text text text text text </p>
</div>
<pb n="3"/> text text text...
</body>
I need to wrap the text only between each pb element to a page element. There is a very similar post on Stackoverflow XSLT wrap nodes between specific element from which I adapted the accepted answer. The problem is that it copies all descendant nodes to the output. I only want the text returned, removing any other elements like <head>
or <byline>
or <p>
etc. Just the text values needs to be copied.
Here's my XSLT:
<xsl:template match="tei:text/tei:body">
<text xmlns="http://digital.library.ptsem.edu/ptsl" type="ocr" source="tei">
<xsl:variable name="parent" select="."/>
<xsl:for-each-group select="descendant::node()" group-starting-with="tei:pb[@n]">
<page number="{@n}" xmlns="http://digital.library.ptsem.edu/ptsl">
<xsl:apply-templates select="$parent/node()[descendant-or-self::node() intersect current-group()]" mode="subtree"/>
</page>
</xsl:for-each-group>
</text>
</xsl:template>
<xsl:template match="tei:pb[@n]" mode="subtree"/>
<xsl:template match="node()" mode="subtree">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates select="node()[descendant-or-self::node() intersect current-group()]" mode="subtree"/>
</xsl:copy>
</xsl:template>
Returned result is:
<?xml version="1.0" encoding="UTF-8"?>
<text>
<page number="1">
<head>text text <lb/>text text<lb/>text text <ref target="n1">
<hi rend="super">1</hi>
</ref>
</head>
<byline>text</byline>
<note type="bio" place="bottom">text text...</note>
<div>
<p>text, <title>text</title> text <title>text</title> text text text<title>Saved</title> text text text</p>
<q>text text text text<ref target="n2">
<hi rend="super">2</hi>
</ref></q>
<p>text <title>text</title> and <title>text, </title>
</p>
</div>
</page>
<page number="2">
<div>
<p>
<title>text</title> text text text</p>
<p> text text text text <hi rend="italic">text,</hi> "text." text text text <hi rend="italic">text,</hi> text text text<ref target="n3">
<hi rend="super">3</hi>
</ref> text text text <hi rend="italic">text</hi> text text text <hi rend="italic">text</hi> text text text text<ref target="n4">
<hi rend="super">4</hi>
</ref> text text text text</p>
<p>text text text text <hi rend="italic">text text text text<ref target="n5">
<hi rend="super">5</hi>
</ref></hi> text text text text text </p>
</div>
</page>
<page number="3"> text text text... </page>
</text>
Desired result is:
<text>
<page number="1">text text text text text text 1 text text text...
text, text text text text text text Saved text text text
text text text text 2 text text and text,
</page>
<page number="2">text text text text
text text text text text, "text." text text text text, text text text 3
text text text text text text text text text text text text 4
text text text text text text text text text text text text 5
text text text text text
</page>
<page number="3"> text text text... </page>
</text>
CodePudding user response:
It seems using
<page number="{@n}" xmlns="http://digital.library.ptsem.edu/ptsl">
<xsl:copy-of select="current-group()[self::text()]"/>
</page>
instead of
<page number="{@n}" xmlns="http://digital.library.ptsem.edu/ptsl">
<xsl:apply-templates select="$parent/node()[descendant-or-self::node() intersect current-group()]" mode="subtree"/>
</page>
should do to just output the grouped text nodes.