Home > Net >  Tokenizing until next occurrence of data
Tokenizing until next occurrence of data

Time:09-30

I have a string like this:

AA 12345678910

BB TESTTESTTEST

BB TESTTESTTEST

BB TESTTESTTEST

CC TEST

AA 0897654321

BB TESTTESTTEST

CC TEST

How would i group by data AA? This is just string by the way. I can do this by positioning but data BB's are multi occurring.

Is it possible to tokenize a chunk of string. In a sentence: "Group by AA until another AA shows up"

CodePudding user response:

Assuming this input:

<input>
AA 12345678910
BB TESTTESTTEST
BB TESTTESTTEST
BB TESTTESTTEST
CC TEST
AA 0897654321
BB TESTTESTTEST
CC TEST
</input>

and this XSLT

<xsl:for-each select="tokenize(input, '^AA ', 'm')">
  <xsl:if test="normalize-space()">
    <block>AA <xsl:value-of select="." /></block>
  </xsl:if>
</xsl:for-each>

we get two blocks:

<block>AA 12345678910
BB TESTTESTTEST
BB TESTTESTTEST
BB TESTTESTTEST
CC TEST
</block><block>AA 0897654321
BB TESTTESTTEST
CC TEST
</block>

tokenize() splits the input string at a delimiter, but it removes the delimiter in the process. That's why we need to add the 'AA ' back manually in the output.

CodePudding user response:

In XSLT 3 (supported since 2017 and by Saxon 9.8 and later, Saxon-JS 2, Altova XML 2017 R3 and later) you can use for-each-group group-starting-with on a sequence of strings:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:output indent="yes"/>

  <xsl:template match="data">
    <xsl:copy>
      <xsl:for-each-group select="tokenize(., '\n')[normalize-space()]" group-starting-with=".[starts-with(., 'AA')]">
        <group pos="{position()}">
          <xsl:apply-templates select="current-group()"/>
        </group>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match=".[. instance of xs:string]">
    <xsl:element name="{substring(., 1, 2)}"/>
  </xsl:template>
  
</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/6qaHaS5

One way in XSLT 2, to use for-each-group similar to the above, would be to first transform the text lines into XML elements.

  • Related