I am attempting to convert account information from a large xml file to csv using Python. This has largely been successful, however the Python script is dropping the leading zeros from account numbers and aligning the truncated number to the right. For example an account number of 007 is cropped to just 7. The account numbers can be numerical, strings or alphanumerical.
This is the current script:
import pandas as pd
import xml.etree.ElementTree as ET
# Parse the XML file and find the root
xml_file = "C:\\Python Scripts\\test.xml"
csv_file = "C:\\Python Scripts\\test.csv"
xml_tree = ET.parse(xml_file)
root = xml_tree.getroot()
# Convert parsed xml file to a csv
get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]
df = pd.DataFrame.from_dict(l)
df.to_csv(csv_file)
Below is the xml file sample, test.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<TSAutoUpload xsi:noNamespaceSchemaLocation="tsautoup.xsd"
xmlns:xsi="http:www.w3.org/2011/XMLSchema-instance">
<Firm>
<AcctNr>TEST</AcctNr>
<LongName>TEST ACCOUNT</LongName>
</Firm>
<Firm>
<AcctNr>007</AcctNr>
<LongName>JAMES BOND INC</LongName>
</Firm>
</TSAutoUpload>
This is the test.csv output, noting the truncation of the leading zeros. The 7 is also aligned to the right, ideally it should read 007, aligned to the left:
AcctNr | LongName | |
---|---|---|
0 | TEST | TEST ACCOUNT |
1 | 7 | JAMES BOND INC |
Please let me know how I can modify the script to keep the leading zeros without loosing any functionality?
Also, of lesser importance, how may the 007 be aligned to the left instead of right please?
CodePudding user response:
Without any external lib
import xml.etree.ElementTree as ET
import csv
xml = '''<TSAutoUpload xsi:noNamespaceSchemaLocation="tsautoup.xsd"
xmlns:xsi="http:www.w3.org/2011/XMLSchema-instance">
<Firm>
<AcctNr>0087</AcctNr>
<LongName>TEST ACCOUNT</LongName>
</Firm>
<Firm>
<AcctNr>007</AcctNr>
<LongName>JAMES BOND INC</LongName>
</Firm>
</TSAutoUpload>'''
root = ET.fromstring(xml)
firms = [f for f in root.findall('.//Firm')]
data = [{c.tag:c.text for c in list(f)} for f in firms]
with open('out.csv','w') as f:
csv_writer = csv.DictWriter(f, list(data[0].keys()))
csv_writer.writeheader()
csv_writer.writerows(data)
out.csv
AcctNr,LongName
0087,TEST ACCOUNT
007,JAMES BOND INC
CodePudding user response:
As <Firm>
is a child nodes of root node <TSAutoUpload>
you can just iterate over root node and write text of each child node of <Firm>
as separate column
Code:
import xml.etree.ElementTree as ET
import csv
# Parse the XML file and find the root
xml_file = r"C:\Python Scripts\test.xml"
csv_file = r"C:\Python Scripts\test.csv"
xml_tree = ET.parse(xml_file)
root = xml_tree.getroot()
with open(csv_file, "w", newline="") as f:
writer = csv.writer(f)
for firm in root:
writer.writerow(node.text for node in firm)