Home > Back-end >  We are not getting different page and text as expected in the output XML format using XSL
We are not getting different page and text as expected in the output XML format using XSL

Time:10-10

XSL is a language for expressing style sheets. An XSL style sheet is, like with CSS, a file that describes how to display an XML document of a given type. Therefore using xml I want convert complete XML into simple XML.

I am getting the XML file from the ABBYY FineReader which is too complex. All I need to convert it into simplified XML. I have made a XSL file to transform the src.xml to target.xml. But I am not getting the correct expected output file.

If anyone have any idea regarding this please help me as soon as possible.

Here is Complex XML file which I want to convert into simplified XML.

Source Code:

        <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
        <document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 12" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
        <page width="294" height="189" resolution="120" originalCoords="1">
        <block blockType="Text" blockName="" l="0" t="5" r="272" b="185"><region><rect l="0" t="5" r="272" b="185"/></region>
        <text>
        <par lineSpacing="2410">
        <line baseline="30" l="1" t="6" r="72" b="30"><formatting lang="EnglishUnitedStates">hello</formatting></line></par>
        <par lineSpacing="1840">
        <line baseline="87" l="0" t="69" r="179" b="87"><formatting lang="EnglishUnitedStates">this is a website</formatting></line></par>
        <par lineSpacing="1260">
        <line baseline="136" l="0" t="122" r="269" b="140"><formatting lang="EnglishUnitedStates">Is the writing getting smaller?</formatting></line></par>
        <par lineSpacing="1260">
        <line baseline="182" l="0" t="169" r="133" b="182"><formatting lang="EnglishUnitedStates">IM SHRINKING</formatting></line></par>
        </text>
        <text>
        <par lineSpacing="2410">
        <line baseline="30" l="1" t="6" r="72" b="30"><formatting lang="EnglishUnitedStates">10</formatting></line></par>
        <par lineSpacing="1840">
        <line baseline="87" l="0" t="69" r="179" b="87"><formatting lang="EnglishUnitedStates">20</formatting></line></par>
        <par lineSpacing="1260">
        <line baseline="136" l="0" t="122" r="269" b="140"><formatting lang="EnglishUnitedStates">30</formatting></line></par>
        <par lineSpacing="1260">
        <line baseline="182" l="0" t="169" r="133" b="182"><formatting lang="EnglishUnitedStates">40</formatting></line></par>
        </text>
        </block>
        </page>
        
        <page width="294" height="189" resolution="120" originalCoords="1">
        <block blockType="Text" blockName="" l="0" t="5" r="272" b="185"><region><rect l="0" t="5" r="272" b="185"/></region>
        <text>
        <par lineSpacing="2410">
        <line baseline="30" l="1" t="6" r="72" b="30"><formatting lang="EnglishUnitedStates">hii</formatting></line></par>
        <par lineSpacing="1840">
        <line baseline="87" l="0" t="69" r="179" b="87"><formatting lang="EnglishUnitedStates">Demo for XSL</formatting></line></par>
        </text>
        </block>
        </page>
        </document>
    

Desired output

Here is the simplified XML which I want

        <?xml version="1.0" encoding="UTF-8"?>
        <document>
            <page>
                <block blockType="Text">
                    <text>
                        <paragraph>
                            <line>hello</line>
                        </paragraph>
                        <paragraph>
                            <line>this is a website</line>
                        </paragraph>
                        <paragraph>
                            <line>Is the writing getting smaller?</line>
                        </paragraph>
                        <paragraph>
                            <line>IM SHRINKING</line>
                        </paragraph>
                    </text>
                    <text>
                        <paragraph>
                            <line>10</line>
                        </paragraph>
                        <paragraph>
                            <line>20</line>
                        </paragraph>
                        <paragraph>
                            <line>30</line>
                        </paragraph>
                        <paragraph>
                            <line>40</line>
                        </paragraph>
                    </text>
                </block>
            </page>
            <page>
                <block blockType="Text">
                    <text>
                        <paragraph>
                            <line>hii</line>
                        </paragraph>
                        <paragraph>
                            <line>Demo for XSL</line>
                        </paragraph>
                    </text>
                </block>
            </page>
        </document>
    

XSL Code

Here is the XSL from which we convert Complex XML into simple XML

        <?xml version="1.0" encoding="UTF-8"?>
        <xsl:stylesheet version="2.0"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xpath-default-namespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
            <xsl:output method="xml" indent="yes"/>
            <xsl:template match="/">
                <document>
                    <page>
                        <block>
                            <xsl:variable name="blockType" select="/document/page/block/@blockType"/>
                            <!-- The variable blockType can be used for further processing.  -->
                            <xsl:attribute name="blockType"><xsl:value-of select="$blockType"/></xsl:attribute>
                           <xsl:for-each select="/document/page/block/text">
                           <text>
                                <xsl:for-each select="/document/page/block/text/par">
                                    <paragraph>
                                        <line>
                                            <xsl:value-of   select="./line"/>
                                        </line>
                                    </paragraph>
                                </xsl:for-each>
                            </text>
                            </xsl:for-each>
                        </block>
                    </page>
                </document>
            </xsl:template>
        </xsl:stylesheet>
    

Actual output

        <?xml version="1.0" encoding="UTF-8"?>
        <document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
           <page>
              <block blockType="Text Text">
                 <text>
                    <paragraph>
                       <line>hello</line>
                    </paragraph>
                    <paragraph>
                       <line>this is a website</line>
                    </paragraph>
                    <paragraph>
                       <line>Is the writing getting smaller?</line>
                    </paragraph>
                    <paragraph>
                       <line>IM SHRINKING</line>
                    </paragraph>
                    <paragraph>
                       <line>10</line>
                    </paragraph>
                    <paragraph>
                       <line>20</line>
                    </paragraph>
                    <paragraph>
                       <line>30</line>
                    </paragraph>
                    <paragraph>
                       <line>40</line>
                    </paragraph>
                    <paragraph>
                       <line>hii</line>
                    </paragraph>
                    <paragraph>
                       <line>Demo for XSL</line>
                    </paragraph>
                 </text>
                 <text>
                    <paragraph>
                       <line>hello</line>
                    </paragraph>
                    <paragraph>
                       <line>this is a website</line>
                    </paragraph>
                    <paragraph>
                       <line>Is the writing getting smaller?</line>
                    </paragraph>
                    <paragraph>
                       <line>IM SHRINKING</line>
                    </paragraph>
                    <paragraph>
                       <line>10</line>
                    </paragraph>
                    <paragraph>
                       <line>20</line>
                    </paragraph>
                    <paragraph>
                       <line>30</line>
                    </paragraph>
                    <paragraph>
                       <line>40</line>
                    </paragraph>
                    <paragraph>
                       <line>hii</line>
                    </paragraph>
                    <paragraph>
                       <line>Demo for XSL</line>
                    </paragraph>
                 </text>
                 <text>
                    <paragraph>
                       <line>hello</line>
                    </paragraph>
                    <paragraph>
                       <line>this is a website</line>
                    </paragraph>
                    <paragraph>
                       <line>Is the writing getting smaller?</line>
                    </paragraph>
                    <paragraph>
                       <line>IM SHRINKING</line>
                    </paragraph>
                    <paragraph>
                       <line>10</line>
                    </paragraph>
                    <paragraph>
                       <line>20</line>
                    </paragraph>
                    <paragraph>
                       <line>30</line>
                    </paragraph>
                    <paragraph>
                       <line>40</line>
                    </paragraph>
                    <paragraph>
                       <line>hii</line>
                    </paragraph>
                    <paragraph>
                       <line>Demo for XSL</line>
                    </paragraph>
                 </text>
              </block>
           </page>
        </document>

CodePudding user response:

Why don't you do simply:

XSLT 2.0

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xpath-default-namespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<xsl:output method="xml" indent="yes"/>

<xsl:template match="/document">
    <document>
        <xsl:for-each select="page">
            <page>
                <xsl:for-each select="block">
                    <block blockType="{@blockType}">
                       <xsl:for-each select="text">
                           <text>
                                <xsl:for-each select="par">
                                    <paragraph>
                                        <line>
                                            <xsl:value-of select="line"/>
                                        </line>
                                    </paragraph>
                                </xsl:for-each>
                            </text>
                        </xsl:for-each>
                    </block>
                </xsl:for-each>
            </page>
        </xsl:for-each>
    </document>
</xsl:template>

</xsl:stylesheet>

Or perhaps even simpler:

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xpath-default-namespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<xsl:output method="xml" indent="yes"/>

<xsl:template match="*">
    <xsl:element name="{local-name()}">
        <xsl:apply-templates/>
    </xsl:element>
</xsl:template>

<xsl:template match="block">
    <block blockType="{@blockType}">
        <xsl:apply-templates select="text"/>    
    </block>        
</xsl:template> 

<xsl:template match="par">
    <paragraph>
        <line>
            <xsl:value-of select="line"/>
        </line>
    </paragraph>
</xsl:template>
        
</xsl:stylesheet>
  • Related