Home > Software engineering >  "Invalid tag name" error when creating element with lxml in python
"Invalid tag name" error when creating element with lxml in python

Time:12-21

I am using lxml to make an xml file and my sample program is :

from lxml import etree
MESSAGETYPEINDIC = 'CRS701'
REPPERIOD = datetime.now().strftime("%Y-%m-%d")
root = etree.Element("crsdac2:CRS-DAC2-LT", attrib={'xmlns:crsdac2': 'urn:sti:ties:crsdac2:v1', 'xmlns:crs': 'urn:sti:ties:sask:v1','xmlns:xsi':'http://www.w3.org/2001/XMLSchema-instance', 'version':'3.141590118408203125', 'xsi:schemaLocation': 'urn:sti:ties:crsdac2:v1 file:///G:/Tax/Tax Technology/CRS (DAC2)/XML Specifikacija (versija nuo 2020-12)/CRS-DAC2-LT_v0.4.xsd' })
crsDAC2_messageSpec = etree.SubElement(root, "crsdac2:MessageSpec")
crsDAC2_messageSpec_messagetypeindic = etree.SubElement(crsDAC2_messageSpec, "crs:MessageTypeIndic").text = MESSAGETYPEINDIC
crsDAC2_messageSpec_repperiod = etree.SubElement(crsDAC2_messageSpec, "crs:ReportingPeriod").text = REPPERIOD
crsDAC2_messageBody = etree.SubElement(root, "crsdac2:MessageBody")
tree = etree.ElementTree(root)
print(tree)
tree_string = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding='UTF-8', standalone="yes")
print(tree_string)

I am getting the below error when I tried running the code above. Can you please help me with resolving this.

ValueError: Invalid tag name 'crsdac2:CRS-DAC2-LT'

I need the output as per below:

<?xml version="1.0" encoding="UTF-8"?>
<crsdac2:CRS-DAC2-LT xmlns:crsdac2="urn:sti:ties:crsdac2:v1" xmlns:crs="urn:sti:ties:crstypessti:v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="3.141590118408203125" xsi:schemaLocation="urn:sti:ties:crsdac2:v1 file:///G:/Tax/Tax Technology/CRS (DAC2)/XML Specifikacija (versija nuo 2020-12)/CRS-DAC2-LT_v0.4.xsd">
    <crsdac2:MessageSpec>
            <crs:MessageTypeIndic>CRS701</crs:MessageTypeIndic>
            <crs:ReportingPeriod>2021-12-31</crs:ReportingPeriod>   
    </crsdac2:MessageSpec>
    <crsdac2:MessageBody>
    </crsdac2:MessageBody>
</crsdac2:CRS-DAC2-LT>

CodePudding user response:

When creating an element or attribute bound to a namespace, you need to use the namespace URI (not the prefix). I suggest using the QName helper class to do this.

from lxml.etree import Element, SubElement, QName, tostring
from datetime import datetime

ns1 = "urn:sti:ties:crsdac2:v1"
ns2 = "urn:sti:ties:crstypessti:v1"
ns3 = 'http://www.w3.org/2001/XMLSchema-instance'

xsd = "file:///G:/Tax/Tax Technology/CRS (DAC2)/XML Specifikacija (versija nuo 2020-12)/CRS-DAC2-LT_v0.4.xsd"

MESSAGETYPEINDIC = 'CRS701'
REPPERIOD = datetime.now().strftime("%Y-%m-%d")

root = Element(QName(ns1, "CRS-DAC2-LT"), nsmap={"crsdac2": ns1, "crs": ns2})
root.set(QName(ns3, "schemaLocation"), xsd)
root.set("version", "3.141590118408203125")

messageSpec = SubElement(root, QName(ns1, "MessageSpec"))

messageTypeIndic = SubElement(messageSpec, QName(ns2, "MessageTypeIndic"))
messageTypeIndic.text = MESSAGETYPEINDIC

messageSpec_repperiod = SubElement(messageSpec, QName(ns2, "ReportingPeriod"))
messageSpec_repperiod.text = REPPERIOD

messageBody = SubElement(root, QName(ns1, "MessageBody"))

tree_string = tostring(root, pretty_print=True, xml_declaration=True,
                             encoding='UTF-8', standalone="yes")
print(tree_string.decode())

Output:

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<crsdac2:CRS-DAC2-LT xmlns:crs="urn:sti:ties:crstypessti:v1" xmlns:crsdac2="urn:sti:ties:crsdac2:v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="file:///G:/Tax/Tax Technology/CRS (DAC2)/XML Specifikacija (versija nuo 2020-12)/CRS-DAC2-LT_v0.4.xsd" version="3.141590118408203125">
  <crsdac2:MessageSpec>
    <crs:MessageTypeIndic>CRS701</crs:MessageTypeIndic>
    <crs:ReportingPeriod>2022-12-20</crs:ReportingPeriod>
  </crsdac2:MessageSpec>
  <crsdac2:MessageBody/>
</crsdac2:CRS-DAC2-LT>
  • Related