Home > Software design >  XSLT filitering for both begin with and followed by characters
XSLT filitering for both begin with and followed by characters

Time:10-21

I am working in project where I am given a list of allowed characters, and required to remove the unwanted characters. I have the following done, but I felt it is cumbersome and than it should be

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:variable name="follow">0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ?abcdefghijklmnopqrstuvwxyz-&apos;.,/@&amp;()! </xsl:variable>
    <xsl:variable name="start">0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ?abcdefghijklmnopqrstuvwxyz</xsl:variable>
    <xsl:template match="/">
        <html>
            <body>
                <xsl:choose>
                    <xsl:when test="contains($start, substring(normalize-space(/Author/Name/FirstName),1,1)) and 
                    string-length(substring(normalize-space(/Author/Name/FirstName),1,1)) > 0 and
                    string-length(translate(substring(normalize-space(/Author/Name/FirstName),2),translate(substring(normalize-space(/Author/Name/FirstName),2),$follow,''),'')) &gt; 0">
                        <div>
                            <xsl:value-of select="translate(substring(normalize-space(/Author/Name/FirstName),1),
                            translate(substring(normalize-space(/Author/Name/FirstName),1),$follow,''),'')" />
                        </div>    
                    </xsl:when>
                    <xsl:otherwise>NULL</xsl:otherwise>
                </xsl:choose>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

Testing the start condition I have added three checks. The contains check returns true for empty string case, and thus I have added string length condition in order to return a NULL for empty string case.

FirstName>?    #</FirstName>//NULL
<FirstName></FirstName>//NULL
<FirstName>   ??</FirstName>//??
<LastName>?t*#</LastName>//?t

My XML for testing is below

<?xml version="1.0" encoding="UTF-8"?>
<Author>
    <Name>
        <FirstName>xxx</FirstName>
    </Name>
</Author>

I may have missed any edge cases, my question is is there a better way to solving this XSLT filtering task where starting and consecutive characters are conditional?

EDIT Reading michael.hor257k comment made me question my approach and understand my requirement more. There is Cybersource page where it specifies allowed characters when making requests to their api. My target is to clean unwanted characters and make sure that field begin and followed by characters meets the specs given on the website. Take Ship-To Company name as example. I am using XSLT 1.0 with java Transformer class

CodePudding user response:

Consider the following simplified example:

XML

<input>
    <item>alpha</item>
    <item>-alpha</item>
    <item>alp§ha</item>
    <item>---al§pha§</item>
    <item>§al-pha</item>
</input>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:variable name="allowed-start-chars">abcdefghijklmnopqrstuvwxyz</xsl:variable>
<xsl:variable name="allowed-follow-chars">abcdefghijklmnopqrstuvwxyz-</xsl:variable>

<xsl:template match="/input">
    <output>
        <xsl:apply-templates/>
    </output>
</xsl:template>

<xsl:template match="item">
    <!-- find the first character eligible to be starting character -->
    <xsl:variable name="start-chars" select="translate(., translate(., $allowed-start-chars, ''), '')"/>
    <xsl:variable name="start-char" select="substring($start-chars, 1, 1)"/>
    <!-- get text after the chosen starting character -->
    <xsl:variable name="tail" select="substring-after(., $start-char)"/>
    <result original="{.}">
        <xsl:value-of select="$start-char"/>
        <!-- remove unwanted characters from tail -->
        <xsl:value-of select="translate($tail, translate($tail, $allowed-follow-chars, ''), '')"/>
    </result>
</xsl:template>

</xsl:stylesheet>

Result

<?xml version="1.0" encoding="UTF-8"?>
<output>
   <result original="alpha">alpha</result>
   <result original="-alpha">alpha</result>
   <result original="alp§ha">alpha</result>
   <result original="---al§pha§">alpha</result>
   <result original="§al-pha">al-pha</result>
</output>

You might want to add a test for the case where all characters turn out to be illegal - although that seems highly unlikely.


Added:

If all you want is to test the input for being valid, then you could do:

<xsl:template match="item">
    <!-- test the first character -->
    <xsl:variable name="valid-start-char" select="contains($allowed-start-chars, substring(., 1, 1))"/>
    <!-- test following characters  -->
    <xsl:variable name="invalid-follow-chars" select="translate(substring(., 2), $allowed-follow-chars, '')"/>
    <result original="{.}">
       <xsl:choose>
        <xsl:when test="$valid-start-char and not($invalid-follow-chars)">
            <xsl:value-of select="."/>
        </xsl:when>
        <xsl:otherwise>NULL</xsl:otherwise>
       </xsl:choose>
    </result>
</xsl:template>

to get:

<?xml version="1.0" encoding="UTF-8"?>
<output>
    <result original="alpha">alpha</result>
    <result original="-alpha">NULL</result>
    <result original="alp§ha">NULL</result>
    <result original="---al§pha§">NULL</result>
    <result original="§al-pha">NULL</result>
</output>
  • Related