Home > other >  How to retain leading 0's when converting xml to csv using Python / elementtree / pandas
How to retain leading 0's when converting xml to csv using Python / elementtree / pandas

Time:10-29

I am attempting to convert account information from a large xml file to csv using Python. This has largely been successful, however the Python script is dropping the leading zeros from account numbers and aligning the truncated number to the right. For example an account number of 007 is cropped to just 7. The account numbers can be numerical, strings or alphanumerical.

This is the current script:

import pandas as pd
import xml.etree.ElementTree as ET

# Parse the XML file and find the root
xml_file = "C:\\Python Scripts\\test.xml"
csv_file = "C:\\Python Scripts\\test.csv"
xml_tree = ET.parse(xml_file)
root = xml_tree.getroot()

# Convert parsed xml file to a csv
get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]
df = pd.DataFrame.from_dict(l)
df.to_csv(csv_file)

Below is the xml file sample, test.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<TSAutoUpload xsi:noNamespaceSchemaLocation="tsautoup.xsd"
    xmlns:xsi="http:www.w3.org/2011/XMLSchema-instance">
    <Firm>
        <AcctNr>TEST</AcctNr>
        <LongName>TEST ACCOUNT</LongName>
    </Firm>
    <Firm>
        <AcctNr>007</AcctNr>
        <LongName>JAMES BOND INC</LongName>
    </Firm>
</TSAutoUpload>

This is the test.csv output, noting the truncation of the leading zeros. The 7 is also aligned to the right, ideally it should read 007, aligned to the left:

AcctNr LongName
0 TEST TEST ACCOUNT
1 7 JAMES BOND INC

Please let me know how I can modify the script to keep the leading zeros without loosing any functionality?
Also, of lesser importance, how may the 007 be aligned to the left instead of right please?

CodePudding user response:

Without any external lib

import xml.etree.ElementTree as ET
import csv


xml = '''<TSAutoUpload xsi:noNamespaceSchemaLocation="tsautoup.xsd"
    xmlns:xsi="http:www.w3.org/2011/XMLSchema-instance">
    <Firm>
        <AcctNr>0087</AcctNr>
        <LongName>TEST ACCOUNT</LongName>
    </Firm>
    <Firm>
        <AcctNr>007</AcctNr>
        <LongName>JAMES BOND INC</LongName>
    </Firm>
</TSAutoUpload>'''

root = ET.fromstring(xml)
firms = [f for f in root.findall('.//Firm')]
data = [{c.tag:c.text for c in list(f)} for f in firms]

with open('out.csv','w') as f:
  csv_writer = csv.DictWriter(f, list(data[0].keys()))
  csv_writer.writeheader()
  csv_writer.writerows(data) 

out.csv

AcctNr,LongName
0087,TEST ACCOUNT
007,JAMES BOND INC

CodePudding user response:

As <Firm> is a child nodes of root node <TSAutoUpload> you can just iterate over root node and write text of each child node of <Firm> as separate column

Code:

import xml.etree.ElementTree as ET
import csv

# Parse the XML file and find the root
xml_file = r"C:\Python Scripts\test.xml"
csv_file = r"C:\Python Scripts\test.csv"
xml_tree = ET.parse(xml_file)
root = xml_tree.getroot()

with open(csv_file, "w", newline="") as f:
    writer = csv.writer(f)
    for firm in root:
        writer.writerow(node.text for node in firm)
  • Related