Home > Software design >  Split google XML items by param value
Split google XML items by param value

Time:10-04

i have one xml file in google vendor standard. There are two language versions of products. I would like to split it into two seperate xml files, keeping file structure. Hard part is, that language info is given only in field link (.pl or .en ). Two separate files would be great, but one is also fine (I'l just make second run with second condition).

I was thinking of something like this in js (pseudocode):

file.rss.channel.children.filter(item=>{return item.link.includes(".en")})

i've tried by xmlstarlet, but with no success

Input file prev:

<rss xmlns:g="http://base.google.com/ns/1.0"
    xmlns:c="http://base.google.com/cns/1.0" version="2.0">
    <channel>
        <title>
            <![CDATA[ Brand ]]>
        </title>
        <link><![CDATA[ site.link.pl ]]></link>
        <description><![CDATA[  ]]></description>
        <item>
            <g:id>132430</g:id>
            <link><![CDATA[item.link.pl]]></link>
            <g:canonical_link>item.link.pl</g:canonical_link>
        </item>
        <item>
            <g:id>132431</g:id>
            <link><![CDATA[item.link.en]]></link>
            <g:canonical_link>item.link.en</g:canonical_link>
        </item>
    </channel>
</rss>

Expected result file:

<rss xmlns:g="http://base.google.com/ns/1.0"
    xmlns:c="http://base.google.com/cns/1.0" version="2.0">
    <channel>
        <title>
            <![CDATA[ Brand ]]>
        </title>
        <link><![CDATA[ site.link.pl ]]></link>
        <description><![CDATA[  ]]></description>
        <item>
            <g:id>132431</g:id>
            <link><![CDATA[item.link.en]]></link>
            <g:canonical_link>item.link.en</g:canonical_link>
        </item>
    </channel>
</rss>

I have no clue how to achieve this, and I will be very grateful for any hints.

CodePudding user response:

This XSLT 3.0 stylesheet:

<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
   <xsl:mode on-no-match="shallow-copy"/>
   <xsl:strip-space elements="*"/>
      
   <xsl:template match="/">
      <xsl:for-each select="//item">
         <xsl:result-document href="{tokenize(normalize-space(.), '\.')[last()]}.xml"
                              indent="yes" 
                              cdata-section-elements="title link">
            <xsl:apply-templates select="root(.)/*">
               <xsl:with-param name="this-item" select="."/>
            </xsl:apply-templates>
         </xsl:result-document>
      </xsl:for-each>
   </xsl:template>
   
   <xsl:template match="item">
      <xsl:param name="this-item"/>
      <xsl:if test=". is $this-item">
         <xsl:copy-of select="."/>
      </xsl:if>
   </xsl:template>
      
</xsl:transform>

outputs two result files, en.xml and pl.xml. Here is en.xml:

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:c="http://base.google.com/cns/1.0"
     xmlns:g="http://base.google.com/ns/1.0"
     version="2.0">
   <channel>
      <title><![CDATA[
          Brand 
      ]]></title>
      <link><![CDATA[ site.link.pl ]]></link>
      <description/>
      <item>
         <g:id>132431</g:id>
         <link><![CDATA[item.link.en]]></link>
         <g:canonical_link>item.link.en</g:canonical_link>
      </item>
   </channel>
</rss>
  • Related