Home > Software engineering >  Get XSI Type from XML in Perl
Get XSI Type from XML in Perl

Time:12-26

There're bunch of XML files in different sub-folders in a root folder. Some of them has following contents.

XML-1

  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Channels>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
        <CableType>XY-1</CableType>
        <Name>C-SPAN</Name>
    </Genre>
    <displayName>C-SPAN Network</displayName>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Sports">
        <CableType>XY-2</CableType>
        <Name>Fox</Name>
    </Genre>
    <displayName>Fox Sports</displayName>
</Channels>

XML-2

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Channels>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
        <CableType>XY-1</CableType>
        <Name>ABC</Name>
    </Genre>
    <displayName>ABC News</displayName>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Movies">
        <CableType>XY-2</CableType>
        <Name>HBO</Name>
    </Genre>
    <displayName>HBO Movies</displayName>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
        <CableType>XY-3</CableType>
        <Name>CBS</Name>
    </Genre>
    <displayName>CBS News</displayName>
</Channels>

XML-3

  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Channels>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
        <CableType>XY-1</CableType>
        <Name>PBS</Name>
    </Genre>
    <displayName>PBS News</displayName>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Sports">
        <CableType>XY-@</CableType>
        <Name>ESPN</Name>
    </Genre>
    <displayName>ESPN Network</displayName>
</Channels>

Goal is to go through all sub-folders and parse the XML and look for xsi:type value. Most XMLs are only expected to have one xsi:type=News in it. But in this case, XML-2 has 2 xsi:type=News in it.

Below is a perl script that i could come up with so far to go through all sub-folders and find XML files and add it to a array list. Now need some help on finding XML files having more than one xsi:type=News.

my $dir = "C:\\perl_scripts";
use File::Find;

find(
{
    wanted => \&findfiles,
},
    $dir
);

sub findfiles
{   
}

my @file_list;
find ( sub {
    return unless -f;       #Must be a file
    return unless /\.xml$/;  #Must end with `.xml` suffix
    push @file_list, $File::Find::name;
}, $dir );

foreach my $title (@file_list) {
    say $title;
}

How is it possible to get the total number of xsi:type=News > 1 and then print it on console?

For above 3 XMLs, it should print XML-2.

UPDATE:

Here's the final code,

use feature qw(say);
use strict;
use warnings;
use XML::LibXML;

my $dir = "C:\\perl_scripts";
use File::Find;

find(
{
    wanted => \&findfiles,
},
    $dir
);

sub findfiles
{   
}

my @file_list;
find ( sub {
    return unless -f;       #Must be a file
    return unless /\.xml$/;  #Must end with `.xml` suffix
    push @file_list, $File::Find::name;
}, $dir );

foreach my $title (@file_list){
    my $doc = XML::LibXML->load_xml(location => $title);
    my %xsi_type;
    for my $node ($doc->findnodes('//Genre')) {
         $xsi_type{ $node->getAttribute('xsi:type') }  ;
    }
    if ($xsi_type{News} > 1) {
        print 'Found file with more than one xsi:type="News" ==> ';
        say $title;
    }
}

CodePudding user response:

Here is an example of how you can use XML::LibXML to determine if a file has more than one tag with xsi:type="News" :

use feature qw(say);
use strict;
use warnings;
use XML::LibXML;

my $xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Channels>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
        <CableType>XY-1</CableType>
        <Name>ABC</Name>
    </Genre>
    <displayName>ABC News</displayName>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Movies">
        <CableType>XY-2</CableType>
        <Name>HBO</Name>
    </Genre>
    <displayName>HBO Movies</displayName>
    <Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
        <CableType>XY-3</CableType>
        <Name>CBS</Name>
    </Genre>
    <displayName>CBS News</displayName>
</Channels>';

my $doc = XML::LibXML->load_xml(string => $xml);
my %xsi_type;
for my $node ($doc->findnodes('//Genre')) {
     $xsi_type{ $node->getAttribute('xsi:type') }  ;
}
if ($xsi_type{News} > 1) {
    say 'Found file with more than one xsi:type="News"';
}
  •  Tags:  
  • perl
  • Related