Home > Blockchain >  Get value between a word and the linebreak in XML
Get value between a word and the linebreak in XML

Time:06-03

I want to extract a value of string which is part of an XML structure with XSLT. Therefore I need to get the word in front of the colon as a node name and the word after the colon as a value for this node. The node name will be the same in every document, but the value will be various so I thought about using wildcards for extracting the value, but I didn't find out how to do that. Can you help me maybe?

<MxML>
    <mail>
        <body>
            Fruit: apple
            Vagetable: potato
            Animal: dog
        </body>
    </mail>
</MxML> 

So the result should look like:

<MxML>
    <mail>
        <Fruit>apple</Fruit>
        <Vagetable>potato</Vagetable>
        <Animal>dog</Animal>
    </mail>
</MxML>

I'm working with XSLT 2.0

CodePudding user response:

Here is one way you could look at it:

XSLT 2.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="mail">
    <xsl:copy>
        <xsl:for-each select="tokenize(body, '&#10;')[normalize-space()]">
            <xsl:element name="{substring-before(., ': ')}">
                <xsl:value-of select="substring-after(., ': ')"/>
            </xsl:element>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Here's another:

<xsl:template match="mail">
    <xsl:copy>
        <xsl:analyze-string select="body" regex="^(. ): (. )$" flags="m">
            <xsl:matching-substring>
                <xsl:element name="{regex-group(1)}">
                    <xsl:value-of select="regex-group(2)"/>
                </xsl:element>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:copy>
</xsl:template>

Note that both assume that the first part of each name/value pair is a valid element name.

CodePudding user response:

you can use tokenize function in template matching mail normalize body text before

<xsl:template match="mail">
    <xsl:variable name="strvalue" select="replace(./body/text(), '(^\n\s )|(\n\s $)', '')"/>
    <xsl:variable name="strvalue" select="replace($strvalue, '\n\s ', '#')"/>
    <xsl:copy>
        <xsl:for-each select="tokenize($strvalue, '#')">
            <xsl:variable select="tokenize(., ': ')" name="values"/>
            <xsl:element name='{$values[1]}'>
                <xsl:value-of select="$values[2]"/>
            </xsl:element>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

this part

<xsl:variable name="strvalue" select="replace(./body/text(), '(^\n\s )|(\n\s $)', '')"/>
<xsl:variable name="strvalue" select="replace($strvalue, '\n\s ', '#')"/>

transforms body text into string where lines are separated by # and save it in variable. the string from body then looks like

Fruit: apple#Vagetable: potato#Animal: dog
  • Related