A lot of Word documents (Word 2003 xml) are to be converted into Docbook 5.1 (30 documents, approx. 80 pages each). I have created a stylesheet for this purpose and it works so far. However, I am not getting anywhere with the following problem:
There are many lists in the documents. The Word XML marks out list items (<w:listPr>
), but as far as I can see, it does not indicate where the list begins and ends. There are only list points.
In XSLT I can now capture the list items (<listitem>
), but I don't know how to surround the list items with the global list tag (<itemizedlist>
).
One way could be to capture the lists with for-each-group or something and copy the text-content of the nodes in my target document. But there are other formatting/elements in the list items like <InstrText>
(Docbook: <indexterm>
) which should not be lost.
How can I handle this?
Word 2003 xml Source (Excerpt)
<w:p>
<w:pPr>
<w:pStyle w:val="2Standard"/>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="14"/>
<wx:t wx:val="·"/>
<wx:font wx:val="Symbol"/>
</w:listPr>
</w:pPr>
<w:r>
<w:t>die Prognose der Wirtschaft</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText> XE "Wirtschaft"</w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:pStyle w:val="2Standard"/>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="14"/>
<wx:t wx:val="·"/>
<wx:font wx:val="Symbol"/>
</w:listPr>
</w:pPr>
<w:r>
<w:t>die Beratung der Politik.</w:t>
</w:r>
</w:p>"
Desired Output
<itemizedlist>
<listitem>
<para>die Prognose der Wirtschaft
<indexterm><primary>Wirtschaft</primary></indexterm>
</para>
</listitem>
<listitem>
<para>die Beratung der Politik.</para>
</listitem>
</itemizedlist>
First Stylesheet approach
<xsl:template match="w:p">
<xsl:choose>
<xsl:when test="w:pPr/w:listPr/w:ilvl/@w:val = '0'">
<listitem>
<para>
<xsl:apply-templates select="w:r"/>
</para>
</listitem>
</xsl:when>
<xsl:otherwise>
<para>
<xsl:apply-templates/>
</para>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="w:r">
<xsl:choose>
<xsl:when test="w:instrText">
<indexterm>
<primary>
<xsl:apply-templates select="*/text()"/>
</primary>
</indexterm>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="w:t"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
CodePudding user response:
I think it should be possible with an approach along the lines of
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xpath-default-namespace="http://example.com/"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="xml" indent="yes" suppress-indentation="indexterm"/>
<xsl:strip-space elements="*"/>
<xsl:template match="root">
<xsl:for-each-group select="p" group-adjacent="boolean(self::p[pPr/listPr])">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<itemizedlist>
<xsl:apply-templates select="current-group()" mode="list"/>
</itemizedlist>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="p" mode="list">
<listitem>
<para>
<xsl:apply-templates mode="#current"/>
</para>
</listitem>
</xsl:template>
<xsl:template match="instrText" mode="list">
<indexterm>
<primary>
<xsl:apply-templates mode="#current"/>
</primary>
</indexterm>
</xsl:template>
</xsl:stylesheet>
This transforms
<w:root xmlns:w="http://example.com/" xmlns:wx="http://example.com/wx">
<w:p>
<w:pPr>
<w:pStyle w:val="2Standard"/>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="14"/>
<wx:t wx:val="·"/>
<wx:font wx:val="Symbol"/>
</w:listPr>
</w:pPr>
<w:r>
<w:t>die Prognose der Wirtschaft</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText> XE "Wirtschaft"</w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:pStyle w:val="2Standard"/>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="14"/>
<wx:t wx:val="·"/>
<wx:font wx:val="Symbol"/>
</w:listPr>
</w:pPr>
<w:r>
<w:t>die Beratung der Politik.</w:t>
</w:r>
</w:p>
</w:root>
into
<itemizedlist>
<listitem>
<para>die Prognose der Wirtschaft<indexterm><primary> XE "Wirtschaft"</primary></indexterm>
</para>
</listitem>
<listitem>
<para>die Beratung der Politik.</para>
</listitem>
</itemizedlist>
Consider to provide namespace well-formed samples/snippets the next time.