I got a XSD File looking like this:
<?xml version="1.0" encoding="utf-8" ?>
<xs:schema version="1.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="TEST">
<xs:complexType>
<xs:sequence>
<xs:element name="Content1" type="xs:integer"/>
<xs:element name="Content2" type="xs:string" />
<xs:element name="Content3" type="xs:string"/>
<xs:element name="Content4" type="xs:string" />
<xs:element name="Content5" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
I want to export each line of my data frame into a seperate xml file using df.to_xml()
. Using
for row in range(df.shape[0]):
df1 = df.iloc[row:row 1]
df1.to_xml(f"{base_path}/{filename}", root_name="TEST", index=False)
Currently it looks like this:
<?xml version='1.0' encoding='utf-8'?>
<TEST>
<row>
<Content1>123</Content1>
<Content2>abc</Content2>
<Content3>242136</Content3>
<Content4>90°</Content4>
</row>
</TEST>
My problem are the lines <row>
and </row>
. How can I prevent them to be created? Alternative I could give the row the name TEST and prevent the root lines to be created if this is possible.
But DataFrame.to_xml
creates a root and a row element. I need only one of them. How does my output contain only one of them?
CodePudding user response:
Another option would be using an XSLT stylesheet:
xslt = '''<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="xml" />
<xsl:template match="row">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
'''
for row in range(df.shape[0]):
df1 = df.iloc[row:row 1]
df1.to_xml(f"{base_path}/{filename}", root_name="TEST", index=False, stylesheet=xslt)
It is applied to the resulting XML file and copies all the elements but "row".
CodePudding user response:
Consider parsing the entire data frame to XML and then iteratively remove the child elements with lxml
(which you do have installed being the default parser of read_xml
and to_xml
). Notice the use of row_name
argument. Below loop uses enumerate
for file naming.
import lxml.etree as lx
...
data = lx.fromstring(df.to_xml(row_name="TEST", index=False))
for n, test in enumerate(data.xpath("//TEST"), start=1):
xmlfile = os.path.join(base_path, f"TEST_{n}.xml")
lx.ElementTree(test).write(xmlfile)