Home > Enterprise >  how to create xml diff with xslt based on element id
how to create xml diff with xslt based on element id

Time:05-27

How to create an xslt (preferably 1.0) transformation providing a description of differences based on an id element. Input files are supposed to stick to the same format and contain items having several child elements. One of the child elements is an id. Compared should be values of elements with same id's. Input formats does not use attributes. The result of the transformation should describe type of differences with attributes as in the example below:

Old File:

<document>
    <item>
        <id>1</id>
        <element1>value1</element1>
        <element2>value2</element2>
    </item>
    <item>
        <id>2</id>
        <element1>value3</element1>
        <element2>value4</element2>
    </item>
    <item>
        <id>3</id>
        <element1>value5</element1>
        <element2>value6</element2>
    </item>
</document>

New File:

<document>
    <item>
        <id>1</id>
        <element1>value1</element1>
        <element2>other_value</element2>
    </item>
    <item>
        <id>2</id>
        <element1>value3</element1>
        <element2>value4</element2>
    </item>
    <item>
        <id>4</id>
        <element1>value7</element1>
        <element2>value8</element2>
    </item>
<document>

Result File:

<document>
    <item >
        <id>1</id>
        <element1>value1</element1>
        <element2 diff="changed" old="value2">other_value</element2>
    </item>
    <item>
        <id>2</id>
        <element1>value3</element1>
        <element2>value4</element2>
    </item>
    <item diff="removed">
        <id>3</id>
        <element1>value5</element1>
        <element2>value6</element2>
    </item>
    <item diff="added">
        <id>4</id>
        <element1>value7</element1>
        <element2>value8</element2>
    </item>
</document>

The solution should not be limited to specific set of child elements.

CodePudding user response:

This is very awkward to do in XSLT, esp. in version 1.0.

The following stylesheet will work for your example. It is assumed that if a corresponding item exists in the new file, then both items have exactly the same child elements (though not necessarily with the same values), with unique names.

As I mentioned in the comments, using a dedicated diff tool would probably be a better choice.

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="new-doc" select="document('new.xml')/document"/>

<xsl:template match="/document">
    <xsl:copy>
        <xsl:apply-templates select="item"/>
        <xsl:apply-templates select="$new-doc/item[not(id=current()/item/id)]" mode="add"/> 
    </xsl:copy>
</xsl:template>

<xsl:template match="item">
    <xsl:variable name="new-item" select="$new-doc/item[id=current()/id]" />
    <xsl:choose>
        <xsl:when test="not($new-item)">
            <item diff="removed">
                <xsl:copy-of select="*"/>
            </item>
        </xsl:when>
        <xsl:otherwise>
            <xsl:copy>
                <xsl:apply-templates/>  
            </xsl:copy>
        </xsl:otherwise>
    </xsl:choose>   
</xsl:template>

<xsl:template match="item" mode="add">
     <item diff="added">
        <xsl:copy-of select="*"/>
    </item>
</xsl:template>

<xsl:template match="item/*">
    <xsl:variable name="new-elem" select="$new-doc/item/*[../id=current()/../id and name()=name(current())]" />
    <xsl:choose>
        <xsl:when test=". = $new-elem">
            <xsl:copy-of select="."/>
        </xsl:when>
        <xsl:otherwise>
            <xsl:copy>
                <xsl:attribute name="diff">changed</xsl:attribute>
                <xsl:attribute name="old">
                    <xsl:value-of select="." />
                </xsl:attribute>
                <xsl:value-of select="$new-elem" />
            </xsl:copy>
        </xsl:otherwise>
    </xsl:choose>   
</xsl:template>

</xsl:stylesheet>

CodePudding user response:

I'll start with XSLT 2.0 and leave you to look at how it might be adapted to 1.0.

First, start with grouping:

<xsl:for-each-group select="$doc1/item, $doc2/item" group-by="id">
  ...
</xsl:for-each-group>

Within the body:

  • if count(current-group()) = 1, the ID exists in only one file; you can work out which by testing (root(current-group()) is $doc1)

  • otherwise (the ID is present in both files), it rather depends on the set of possible differences you want to cater for. You've provided an example, but an example is not the same as a specification. If we assume that all the children of item are elements in the form of your example (<E>value</E>) and that each such element appears at most once, then you could do a further grouping of current-group()/*[not(self::id)] grouped by node-name(.), and:

** if the current-group() has two elements, compare their values using "=" or deep-equal()`

** if it has only one element, report that as a difference.

  • Related