I am trying to mask an xml document where some specific tags are present. I have created a java app which contains saxon9he
as dependency.
<dependencies>
<dependency>
<groupId>net.sf.saxon</groupId>
<artifactId>saxon9he</artifactId>
<version>9.4.0.4</version>
</dependency>
</dependencies>
I have multiple use case, some are straight forward but some are conditional. Assuming the below given <Prsn>
tag is present at multiple different locations:
Input xml snippet
<ns3:Prsn>
<ns3:FrstNm>BDMFN</ns3:FrstNm>
<ns3:Nm>BDMSN</ns3:Nm>
<ns3:BirthDt>2000-01-02</ns3:BirthDt>
<ns3:Othr>
<ns3:Id>GB1592102</ns3:Id>
<ns3:SchmeNm>
<ns3:Cd>CCPT</ns3:Cd>
</ns3:SchmeNm>
</ns3:Othr>
</ns3:Prsn>
Transformation that is needed
In this above provided XML, we have some tags [FrstNm, Nm, BirthDt] which we need to mask (remove the actual data from these tags and replace with # for each character), which by the way I have achieved so far.
Need Help
Tricky part is when we have tag <Othr><SchmeNm><Cd>
which can have values [NIND, CCPT, CONCAT], we need to mask <Othr><id>
, but any other value in <Othr><SchmeNm><Cd>
apart from NIND, CCPT, CONCAT then no change in <Othr><id>
.
Transformation.xsl
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="*[local-name()='FrstNm']">
<xsl:copy>
<xsl:value-of select="replace(text(), '[A-Za-z]','#')" />
</xsl:copy>
</xsl:template>
<xsl:template match="*[local-name()='Nm']">
<xsl:copy>
<xsl:value-of select="replace(text(), '[A-Za-z]','#')" />
</xsl:copy>
</xsl:template>
<xsl:template match="*[local-name()='BirthDt']">
<xsl:copy>
<xsl:value-of select="replace(text(), '[0-9]','#')" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
CodePudding user response:
If you want to do regex-based search and replace, the minimum XSLT version you need is XSLT 2.0.
Also, don't use local-name()
. Register a prefix for the namespace URI and use that. The prefix does not have to match the XML document, as long as the namespace URI is the same.
Input:
<ns3:Prsn xmlns:ns3="some-namespace-uri">
<ns3:FrstNm>BDMFN</ns3:FrstNm>
<ns3:Nm>BDMSN</ns3:Nm>
<ns3:BirthDt>2000-01-02</ns3:BirthDt>
<ns3:Othr>
<ns3:Id>GB1592102</ns3:Id>
<ns3:SchmeNm>
<ns3:Cd>CCPT</ns3:Cd>
</ns3:SchmeNm>
</ns3:Othr>
</ns3:Prsn>
XSLT 2.0 :
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:person="some-namespace-uri"
>
<xsl:output method="xml" indent="yes" />
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="person:FrstNm|person:Nm|person:BirthDt">
<xsl:copy>
<xsl:value-of select="replace(text(), '[A-Za-z0-9]', '#')" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output:
<ns3:Prsn xmlns:ns3="some-namespace-uri">
<ns3:FrstNm>#####</ns3:FrstNm>
<ns3:Nm>#####</ns3:Nm>
<ns3:BirthDt>####-##-##</ns3:BirthDt>
<ns3:Othr>
<ns3:Id>GB1592102</ns3:Id>
<ns3:SchmeNm>
<ns3:Cd>CCPT</ns3:Cd>
</ns3:SchmeNm>
</ns3:Othr>
</ns3:Prsn>
If you only have XSLT 1.0 available, you can use translate()
. But that requires that you either explicitly list all possible input characters:
<xsl:template match="person:FrstNm|person:Nm|person:BirthDt">
<xsl:copy>
<xsl:value-of select="tanslate(
text(),
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-add-everything-else',
'##################################################################################'
)" />
</xsl:copy>
</xsl:template>
or that you settle on something simpler:
<xsl:template match="person:FrstNm|person:Nm|person:BirthDt">
<xsl:copy>
<xsl:text>[redacted]</xsl:text>
</xsl:copy>
</xsl:template>
Tricky part is when we have tag
<Othr><SchmeNm><Cd>
which can have values [NIND, CCPT, CONCAT], we need to mask<Othr><id>
, but any other value in<Othr><SchmeNm><Cd>
apart from NIND, CCPT, CONCAT then no change in<Othr><id>
.
That's easy. in XSLT 1.0 this works:
<xsl:template match="
person:FrstNm|person:Nm|person:BirthDt|person:Id[
../person:SchmeNm/person:Cd = 'NIND' or
../person:SchmeNm/person:Cd = 'CCPT' or
../person:SchmeNm/person:Cd = 'CONCAT'
]
">
or even this:
<xsl:template match="
person:FrstNm|person:Nm|person:BirthDt|person:Id[
contains('|NIND|CCPT|CONCAT|', concat('|', ../person:SchmeNm/person:Cd, '|'))
]
">
In XSLT 2.0 you can use sequences:
<xsl:template match="
person:FrstNm|person:Nm|person:BirthDt|person:Id[
../person:SchmeNm/person:Cd = ('NIND', 'CCPT', 'CONCAT')
]
">