I need to search by attributes using a regular expression.
In Python it would look like this:
from lxml import etree
dom = etree.parse(r'/path/to/file.XML')
regexpNS = "http://exslt.org/regular-expressions"
els = dom.xpath("//*[(re:test(@NAME, '.*Town.*', 'i')) and (@ISACTIVE='1' )]", namespaces={'re':regexpNS})
el = els[0]
print(el.attrib['NAME'] " => " el.attrib['OBJECTGUID'])
I don't understand how to do it in perl
my $dom = XML::LibXML->new->parse_file("/path/to/file.XML");
my $xpc = XML::LibXML::XPathContext->new($dom);
$xpc->registerNs('re', 'http://exslt.org/regular-expressions');
print $xpc->findnodes(q{//*[(re:test(@NAME, '.*Town.*', 'i')) and (@ISACTIVE='1' )]});
gives an error message
error : xmlXPathCompOpEval: function test not found XPath error :
Unregistered function at line ...
I tried to rewrite the well-known example:
Custom XPath functions
This example demonstrates registerFunction() method by defining a function filtering nodes based on a Perl regular expression:
sub grep_nodes {
my ($nodelist,$regexp) = @_;
my $result = XML::LibXML::NodeList->new;
for my $node ($nodelist->get_nodelist()) {
$result->push($node) if $node->textContent =~ $regexp;
}
return $result;
};
my $xc = XML::LibXML::XPathContext->new($node);
$xc->registerFunction('grep_nodes', \&grep_nodes);
my @nodes = $xc->findnodes('//section[grep_nodes(para,"\bsearch(ing|es)?\b")]');
Rewrote it like this:
use XML::LibXML;
my $dom = XML::LibXML->new->parse_string(<<'EOT');
<?xml version="1.0" encoding="utf-8"?>
<ADDRESSOBJECTS>
<OBJECT ID="1" NAME="Broadway" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="2" NAME="Times Square" TYPENAME="sq" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="3" NAME="DownTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="4" NAME="MidthTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="5" NAME="UpTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
</ADDRESSOBJECTS>
EOT
sub grep_attrs {
my ($nodelist,$attr_name,$regexp) = @_;
my $result = XML::LibXML::NodeList->new;
for my $node ($nodelist->get_nodelist()) {
my %attrs = map { $_->getName => $_->getValue } $node->attributes;
$result->push($node) if $attrs{$attr_name} =~ $regexp;
print $attrs{$attr_name}."\n" if $attrs{$attr_name} =~ $regexp;
}
return $result;
};
print "\n-========================================-\n";
my $xc = XML::LibXML::XPathContext->new($dom);
$xc->registerFunction('grep_attrs', \&grep_attrs);
my @nodes = $xc->findnodes(q{//*[grep_attrs(OBJECT,'NAME','.*Town.*')]});
print "\n-========================================-\n";
print @nodes;
print "\n-========================================-\n";
output result:
-========================================-
DownTown
MidthTown
UpTown
-========================================-
<ADDRESSOBJECTS>
<OBJECT ID="1" NAME="Broadway" TYPENAME="st" LEVEL="8" ISACTIVE="1"/>
<OBJECT ID="2" NAME="Times Square" TYPENAME="sq" LEVEL="8" ISACTIVE="1"/>
<OBJECT ID="3" NAME="DownTown" TYPENAME="st" LEVEL="8" ISACTIVE="1"/>
<OBJECT ID="4" NAME="MidthTown" TYPENAME="st" LEVEL="8" ISACTIVE="1"/>
<OBJECT ID="5" NAME="UpTown" TYPENAME="st" LEVEL="8" ISACTIVE="1"/>
</ADDRESSOBJECTS>
-========================================-
function works, but!
- too long, many times longer than in python
- For some reason it returns the full tree, not the found nodes.
Help me understand the problem and how can use regular expressions when searching by attributes??
CodePudding user response:
Using XML::XPath
instead, which has support for regular expressions via the matches()
XPath 2.0 function as of a few releases ago thanks to yours truly:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use XML::XPath v1.45;
my $xml = <<'EOXML';
<?xml version="1.0" encoding="utf-8"?>
<ADDRESSOBJECTS>
<OBJECT ID="1" NAME="Broadway" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="2" NAME="Times Square" TYPENAME="sq" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="3" NAME="DownTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="4" NAME="MidthTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="5" NAME="UpTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
</ADDRESSOBJECTS>
EOXML
my $xp = XML::XPath->new(xml => $xml);
my @nodes = $xp->findnodes(q{//*[matches(@NAME, 'town', 'i') and @ISACTIVE = 1]});
for my $node (@nodes) {
say $node->getAttribute('NAME');
}
prints out
DownTown
MidthTown
UpTown
CodePudding user response:
Solution using the XML::LibXML module and a registered function:
#!/usr/bin/env perl -CSDA
use utf8;
use warnings;
use strict;
use feature qw/say/;
use XML::LibXML;
my $xml = <<'EOXML';
<?xml version="1.0" encoding="utf-8"?>
<ADDRESSOBJECTS>
<OBJECT ID="1" NAME="Broadway" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="2" NAME="Times Square" TYPENAME="sq" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="3" NAME="DownTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="4" NAME="MidthTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="5" NAME="UpTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
</ADDRESSOBJECTS>
EOXML
my $dom = XML::LibXML->new->parse_string($xml);
sub xpath_matches {
my ($input,$pattern,$flg) = @_;
$flg = '' if !defined ($flg);
return 1 if $input =~ /(?$flg)$pattern/;
return undef;
}
my $xc = XML::LibXML::XPathContext->new($dom);
$xc->registerFunction('matches', \&xpath_matches);
say $_->getAttribute('NAME').' '.$_->getAttribute('TYPENAME')
for $xc->findnodes(q{
//OBJECT[@NAME and matches(@NAME,'.*[tT]oWn$','i')]
});
PS: because there are no boolean values in perl, and matches is a function that returns some value, then its result must be compared with some value, in this case with 1, if only one function is used in the XPATH query, see example:
//OBJECT[matches(@NAME,'.*[tT]oWn$','i')=1]
if there are some other operations in the XPATH query, then you can use it without comparison, as in the example above
//OBJECT[@NAME and matches(@NAME,'.*[tT]oWn$','i')]
or
//OBJECT[matches(@NAME,'.*[tT]oWn$','i') and 1]
This solution is almost 15 times faster than XML::XPath on very large XML files. And on files of hundreds of megabytes, XML::XPath just dies.
All was need to do it was write and use the registered function correctly.
So regular expressions in XPATH in XML::LibXML can be used!