There're bunch of XML files in different sub-folders in a root folder. Some of them has following contents.
XML-1
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Channels>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
<CableType>XY-1</CableType>
<Name>C-SPAN</Name>
</Genre>
<displayName>C-SPAN Network</displayName>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Sports">
<CableType>XY-2</CableType>
<Name>Fox</Name>
</Genre>
<displayName>Fox Sports</displayName>
</Channels>
XML-2
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Channels>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
<CableType>XY-1</CableType>
<Name>ABC</Name>
</Genre>
<displayName>ABC News</displayName>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Movies">
<CableType>XY-2</CableType>
<Name>HBO</Name>
</Genre>
<displayName>HBO Movies</displayName>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
<CableType>XY-3</CableType>
<Name>CBS</Name>
</Genre>
<displayName>CBS News</displayName>
</Channels>
XML-3
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Channels>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
<CableType>XY-1</CableType>
<Name>PBS</Name>
</Genre>
<displayName>PBS News</displayName>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Sports">
<CableType>XY-@</CableType>
<Name>ESPN</Name>
</Genre>
<displayName>ESPN Network</displayName>
</Channels>
Goal is to go through all sub-folders and parse the XML and look for xsi:type value. Most XMLs are only expected to have one xsi:type=News in it. But in this case, XML-2 has 2 xsi:type=News in it.
Below is a perl script that i could come up with so far to go through all sub-folders and find XML files and add it to a array list. Now need some help on finding XML files having more than one xsi:type=News.
my $dir = "C:\\perl_scripts";
use File::Find;
find(
{
wanted => \&findfiles,
},
$dir
);
sub findfiles
{
}
my @file_list;
find ( sub {
return unless -f; #Must be a file
return unless /\.xml$/; #Must end with `.xml` suffix
push @file_list, $File::Find::name;
}, $dir );
foreach my $title (@file_list) {
say $title;
}
How is it possible to get the total number of xsi:type=News > 1 and then print it on console?
For above 3 XMLs, it should print XML-2.
UPDATE:
Here's the final code,
use feature qw(say);
use strict;
use warnings;
use XML::LibXML;
my $dir = "C:\\perl_scripts";
use File::Find;
find(
{
wanted => \&findfiles,
},
$dir
);
sub findfiles
{
}
my @file_list;
find ( sub {
return unless -f; #Must be a file
return unless /\.xml$/; #Must end with `.xml` suffix
push @file_list, $File::Find::name;
}, $dir );
foreach my $title (@file_list){
my $doc = XML::LibXML->load_xml(location => $title);
my %xsi_type;
for my $node ($doc->findnodes('//Genre')) {
$xsi_type{ $node->getAttribute('xsi:type') } ;
}
if ($xsi_type{News} > 1) {
print 'Found file with more than one xsi:type="News" ==> ';
say $title;
}
}
CodePudding user response:
Here is an example of how you can use XML::LibXML
to determine if a file has more than one tag with xsi:type="News"
:
use feature qw(say);
use strict;
use warnings;
use XML::LibXML;
my $xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Channels>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
<CableType>XY-1</CableType>
<Name>ABC</Name>
</Genre>
<displayName>ABC News</displayName>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Movies">
<CableType>XY-2</CableType>
<Name>HBO</Name>
</Genre>
<displayName>HBO Movies</displayName>
<Genre xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="News">
<CableType>XY-3</CableType>
<Name>CBS</Name>
</Genre>
<displayName>CBS News</displayName>
</Channels>';
my $doc = XML::LibXML->load_xml(string => $xml);
my %xsi_type;
for my $node ($doc->findnodes('//Genre')) {
$xsi_type{ $node->getAttribute('xsi:type') } ;
}
if ($xsi_type{News} > 1) {
say 'Found file with more than one xsi:type="News"';
}