Home > database >  Merge and manipulate xslt file using python lxml
Merge and manipulate xslt file using python lxml

Time:12-18

im a newbie in python and i have a difficult task to cope. Suppose we have two xslt files, the first one is like this:

<xsl:stylesheet version="1.0">
    <xsl:function name="grp:MapToCD538A_var107">
        <xsl:param name="var106_cur" as="node()"/>
    </xsl:function>
    <xsl:template match="/">
        <CD123>
            <xsl:attribute name="xsi:schemaLocation" namespace="http://www.w3.org/2001/XMLSchema-instance"/>
            <xsl:for-each select="(./ns0:CD538C)[fn:not(fn:exists(*:ExportOperation[fn:namespace-uri() eq '']/*:requestRejectionReasonCode[fn:namespace-uri() eq '']))]">
                <SynIde xmlns="">UN1OC</SynIde>
                <SynVer xmlns="">
                    <xsl:sequence select="xs:string(xs:integer('3'))"/>
                </SynVer>
            </xsl:for-each>
        </CD123>
    </xsl:template>
</xsl:stylesheet>

and the second one is like this:

<xsl:stylesheet version="1.0">
    <xsl:output method="xml" encoding="UTF-8" byte-order-mark="no" indent="yes"/>
    <xsl:template match="/">
        <CD96A>
            <xsl:attribute name="xsi:schemaLocation" namespace="http://www.w3.org/2001/XMLSchema-instance"/>
            <xsl:for-each select="(./ns0:CD538C)[fn:exists(*:ExportOperation[fn:namespace-uri() eq '']/*:requestRejectionReasonCode[fn:namespace-uri() eq ''])]">
                <SynIdeMES1 xmlns="">UNOC</SynIdeMES1>
                <SynVerNumMES2 xmlns="">
                    <xsl:sequence select="xs:string(xs:integer('3'))"/>
                </SynVerNumMES2
            </xsl:for-each>
        </CD96A>
    </xsl:template>
</xsl:stylesheet>

Now is the tricky part with the merge process. I want somehow to merge these two file into one with the following output

<xsl:stylesheet version="1.0">
        <xsl:function name="grp:MapToCD538A_var107">
            <xsl:param name="var106_cur" as="node()"/>
        </xsl:function>
        <xsl:template match="/">
            <xsl:for-each select="(./ns0:CD538C)[fn:not(fn:exists(*:ExportOperation[fn:namespace-uri() eq '']/*:requestRejectionReasonCode[fn:namespace-uri() eq '']))]">
                <CD123>
                    <xsl:attribute name="xsi:schemaLocation" namespace="http://www.w3.org/2001/XMLSchema-instance"/>
                        <SynIde xmlns="">UN1OC</SynIde>
                        <SynVer xmlns="">
                            <xsl:sequence select="xs:string(xs:integer('3'))"/>
                        </SynVer>
                </CD123>
            </xsl:for-each> 
            <xsl:for-each select="(./ns0:CD538C)[fn:exists(*:ExportOperation[fn:namespace-uri() eq '']/*:requestRejectionReasonCode[fn:namespace-uri() eq ''])]">
            <CD96A>
                <xsl:attribute name="xsi:schemaLocation" namespace="http://www.w3.org/2001/XMLSchema-instance"/>
                    <SynIdeMES1 xmlns="">UNOC</SynIdeMES1>
                    <SynVerNumMES2 xmlns="">
                        <xsl:sequence select="xs:string(xs:integer('3'))"/>
                    </SynVerNumMES2
            </CD96A>
            </xsl:for-each>
        </xsl:template>
    </xsl:stylesheet>

As you can see i have one <xsl:template match="/"> and after that there is the first for each with the node and its content which is nested under the first for each and after the first for each i have the second for each of the second message which contains the node and its content

I have tried using the lxml librady since it's recommended for xml manipulation

# Parse the first XSLT file
xslt_doc_1 = etree.parse("first file.xslt")

# Find the root element of the first XSLT file
root_1 = xslt_doc_1.getroot()

# Parse the second XSLT file
xslt_doc_2 = etree.parse("second file.xslt")

# Find the root element of the second XSLT file
root_2 = xslt_doc_2.getroot()

# Add the root element of the second XSLT file as a child of the root element of the first XSLT file
root_1.extend(root_2)

# Write the merged XSLT file to a new file
with open("merged_xslt_file.xslt", "w") as f:
    f.write(etree.tostring(xslt_doc_1, pretty_print=True).decode())

and tried to manipulate the output file but with no success. Do you know how to achieve the desired ouput?

CodePudding user response:

As already noted in comments, these are XSLT 2.0 stylesheets mislabelled as XSLT 1.0, but that's not relevant to the problem because you're just treating them as data.

However, if XSLT technology is already in the mix, it seems very odd to be doing a transformation using Python rather than in XSLT.

Again, as already noted in comments, one example of an input and a desired output does not constitute a specification. To write the code for a transformation (or for any program!) we need to know what all the possible inputs are, and to know the rules for dealing with each of them. (If these two files were the only possible input, you could do the transformation manually by copy-and-paste -- indeed, you've already done it).

It's possible that the transformation you want is something like this:

 <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="2.0"
      xmlns:r="http://alias.namespace/">
    
    <xsl:variable name="in1" select="doc('first.xsl')"/>
    <xsl:variable nmae="in2" select="doc('second.xsl')"/>
    
    <xsl:namespace-alias stylesheet-prefix="r" result-prefix="xsl"/>
    <xsl:output indent="yes"/>
    
    <xsl:template name="main">
      <r:transform version="2.0">
        <xsl:copy-of select="($in1/*/*, $in2/*/*)
                               [not(self::xsl:template[@match='/']"/>
        <r:template match="/">
          <r:for-each select="(./ns0:CD538C)  
                 [not(exists(*:ExportOperation[namespace-uri() eq 
                   '']/*:requestRejectionReasonCode[namespace-uri() eq '']))]">
            <xsl:copy-of select="$in1//xsl:template[@match='/']/*"/>
          </r:for-each>
          <r:for-each select="(./ns0:CD538C)  
                 [not(exists(*:ExportOperation[namespace-uri() eq 
                   '']/*:requestRejectionReasonCode[namespace-uri() eq '']))]">
            <xsl:copy-of select="$in2//xsl:template[@match='/']/*"/>
          </r:for-each>
        </r:template>   
      </r:transform>
    </xsl:template>
  </xsl:transform>
  

But there's a lot of guesswork there so I might be wrong.

CodePudding user response:

Given you are using Python's lxml and XSLT scripts are XML files, consider actually running XSLT 1.0 twice to: 1) merge the .xslt documents and 2) manipulate the transformed merged document.

xslt_merge.xslt (using document() on second .xslt)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes" encoding="utf-8"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <xsl:template match="xsl:template">
     <xsl:copy>
        <xsl:apply-templates select="*|@*"/>
        <!-- COPY FROM OTHER DOCUMENT -->
        <xsl:copy-of select="document('Second_XSLT.xslt')/*/*[2]/*"/>
     </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

xslt_manipulate.xslt (adjusting hierarchy of xsl:for-each)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes" encoding="utf-8"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>
    
    <xsl:template match="xsl:template">
     <xsl:copy>
        <!-- SKIP NODES ABOVE xsl:for-each -->
        <xsl:apply-templates select="@*|descendant::xsl:for-each"/>
     </xsl:copy>
    </xsl:template>

    <xsl:template match="xsl:for-each">
     <xsl:copy>
        <xsl:apply-templates select="@*"/>
        <!-- PULL PARENT AND SIBLING NODES -->
        <xsl:element name="{name(..)}">
            <xsl:apply-templates select="preceding-sibling::*[1]"/>       
            <xsl:apply-templates select="*"/>
        </xsl:element>
     </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
xml = et.parse('First_XSLT.xslt')
xsl_merge = et.parse('xslt_merge.xslt')
xsl_manip = et.parse('xslt_manipulate.xslt')

# MERGE DOCUMENTS
transform = et.XSLT(xsl_merge)
result1 = transform(xml)

# MANIPULATE DOCUMENT
transform = et.XSLT(xsl_manip)
result2 = transform(result1)

# PRINT AND SAVE OUTPUT
print(result2)
result2.write_output("final.xslt")

Output final.xslt

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:function name="grp:MapToCD538A_var107">
    <xsl:param name="var106_cur" as="node()"/>
  </xsl:function>
  <xsl:template match="/">
    <xsl:for-each select="(./ns0:CD538C)[fn:not(fn:exists(*:ExportOperation[fn:namespace-uri() eq '']/*:requestRejectionReasonCode[fn:namespace-uri() eq '']))]">
      <CD123>
        <xsl:attribute name="xsi:schemaLocation" namespace="http://www.w3.org/2001/XMLSchema-instance"/>
        <SynIde xmlns="">UN1OC</SynIde>
        <SynVer xmlns="">
          <xsl:sequence select="xs:string(xs:integer('3'))"/>
        </SynVer>
      </CD123>
    </xsl:for-each>
    <xsl:for-each select="(./ns0:CD538C)[fn:exists(*:ExportOperation[fn:namespace-uri() eq '']/*:requestRejectionReasonCode[fn:namespace-uri() eq ''])]">
      <CD96A>
        <xsl:attribute name="xsi:schemaLocation" namespace="http://www.w3.org/2001/XMLSchema-instance"/>
        <SynIdeMES1 xmlns="">UNOC</SynIdeMES1>
        <SynVerNumMES2 xmlns="">
          <xsl:sequence select="xs:string(xs:integer('3'))"/>
        </SynVerNumMES2>
      </CD96A>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

Note: above solution does not test whether your final XSLT makes sense or complies with 1.0.

  • Related