Home > Back-end >  How to generate a XML tag without tail "/"?
How to generate a XML tag without tail "/"?

Time:07-16

I'm trying to make a XML document. Especially, as below

<spirit:component xmlns:spirit="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4"
                xmlns:vendorExtensions="$IREG_GEN/XMLSchema/SPIRIT"     
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   
                xsi:schemaLocation="$IREG_GEN/XMLSchema/SPIRIT/VendorExtensions.xsd 
                                    http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4 
                                    http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4/index.xsd">

So I made a perl script for this as below

use strict;
use warnings;

use Spreadsheet::ParseXLSX;
use XML::LibXML;
my $doc = XML::LibXML::Document->new('1.0', 'utf-8');
my $root = $doc->createElement('spirit:component');
#$root->appendChild($doc->createComment("JJ"));
$root->setAttribute('xmlns:spirit'=> "http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4");
$root->setAttribute('xmlns:vendorExtensions'=> "\$IREG_GEN/XMLSchema/SPIRIT");
$root->setAttribute('xmlns:xsi'=> "http://www.w3.org/2001/XMLSchema-instance");
$root->setAttribute('xsi:schemaLocation'=> "http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4 
                                            http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4 
                                            http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4/index.xsd");

$doc->setDocumentElement($root);
print $doc->toString(1);

But problem is that I got the result

<spirit:component xmlns:spirit="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4" xmlns:vendorExtensions="$IREG_GEN/XMLSchema/SPIRIT" xmlns:xsi
="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4 &#10;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4 &#10;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4/index.xsd"/>

Especially, there are 2 problem here, &#9; and index.xsd"/>

I can remove newline then I resolve it as the below

$root->setAttribute('xsi:schemaLocation'=> "http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4 http://www.spiritconsortium.org/XMLSchema/SPIRIT/1
.4 http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4/index.xsd");

Especially, how can I remove / inindex.xsd"/>? Did I use wrong function?

CodePudding user response:

In XML, a tag without any children or other enclosed content can be, and typically is, written as a single empty-element form <foo/> instead of <foo></foo>. It needs to be one or the other, though; unlike HTML, in XML every opening tag needs a closing one. So there's nothing wrong with that part of the output.

For the text of the xsi:schemaLocation attribute (Which needs to have an even number of elements - it's pairs of namespace and schema URL)... &#9; is a tab; replace them with spaces; those won't get encoded. The newlines still will, though. According to this answer to a SO question on if newlines are valid in attribute text, entities are converted to characters and all whitespace in an attribute should be converted to spaces by an XML parser when a program using one requests the content. So while it looks ugly, in practice with conforming XML parsers, what you have shouldn't cause issues.

Testing by piping the output of your script to this one:

#!/usr/bin/env perl                                                                                                                                                                                                                              
use warnings;                                                                                                                                                                                                                                    
use strict;                                                                                                                                                                                                                                      
use feature qw/say/;                                                                                                                                                                                                                             
use XML::LibXML;                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                 
my $dom = XML::LibXML->load_xml({ IO => \*STDIN });                                                                                                                                                                                              
my $root = $dom->documentElement();                                                                                                                                                                                                              
for my $attr ($root->attributes()) {
    say $attr->name(), " is ", $attr->getValue();
}

prints out

schemaLocation is http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4 
                                            http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4 
                                            http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4/index.xsd
xmlns:spirit is http://www.spiritconsortium.org/XMLSchema/SPIRIT/1.4
xmlns:vendorExtensions is $IREG_GEN/XMLSchema/SPIRIT
xmlns:xsi is http://www.w3.org/2001/XMLSchema-instance

so that seems to be true with libxml2, at least.

  • Related