I am attempting to use xmltodict to parse through XML in to hopes of eventually converting to a more readable table format for others. I have been able to get through most of the XML but when I come to an element with multiple subelements, I feel I am chasing my tail. My hope is to use panda with the values I extract from the XML...
Here is a sanitized version of the XML I am attempting to parse:
<batchConfiguration>
<batchJob name="BATCHJOB1">
<className>batchJob1</className>
<schedule>Y</schedule>
<interval>300</interval>
<systemControlled>N</systemControlled>
</batchJob>
<batchJob name="BATCHJOB2">
<params>
<param name="QueueName1">batchQueue1</param>
</params>
<className>batchJob2</className>
<startTime>02:10:00</startTime>
<schedule>N</schedule>
<daysOfTheWeek>YYYYYYY</daysOfTheWeek>
<systemControlled>N</systemControlled>
</batchJob>
<batchJob name="BATCHJOB3">
<params>
<param name="ignoreErrors">Y</param>
<param name="batchSize">1000</param>
</params>
<className>classyBatchJob</className>
<schedule>Y</schedule>
<interval>90</interval>
<systemControlled>N</systemControlled>
</batchJob>
</batchConfiguration>
My thought was I could somehow loop through the lines where there are multiple "params". I can return a single line of "params" but stumped when there are multiple. Here is my code to date. It has pieces parts where I try to figure things as I go. The XML is read from a file...
import xmltodict as xml
import pprint
#File to parse
fileptr=open(r"FileIRead.xml")
# Show raw XML text file data
raw_file= fileptr.read()
# print(raw_file)
# Create an XML dictionary
xml_dict=xml.parse(raw_file)
pprint.pprint(xml_dict)
xml_dict1=xml.parse(raw_file)['batchConfiguration']['batchJob']
pprint.pprint(xml_dict1)
# pprint.pprint(xml_dict['batchConfiguration']['batchJob'])
# https://docs.python.org/3/tutorial/errors.html
for bJ in xml_dict1:
bJName=bJ['@name']
print(f"Name: {bJ['@name']}")
print(bJName)
try:
print(f"Interval: {bJ['interval']}")
except:
print("Interval: N/A")
try:
print(f"Scheduled: {bJ['schedule']}")
except:
print("N/A")
try:
print(f"Start Time: {bJ['startTime']}")
except:
print("Start Time: N/A")
try:
print(f"End Time: {bJ['endTime']}")
except:
print("End Time: N/A")
try:
# This works fine to return only a single element. With multiple it fails.
print(f"Params: {bJ['params']['param']['@name']} - {bJ['params']['param']['#text']}")
except:
print("Params: N/A")
try:
print(f"Classname: {bJ['className']}")
except:
print("Classname: N/A")
try:
print(f"DaysOfWeek: {bJ['daysOfTheWeek']}")
except:
print("DaysOfWee: N/A")
try:
# Attempt to get all parameters single or multiple
xml_dict2=xml.parse(raw_file)['params']['param']
pprint.pprint(xml_dict2)
for bJ1 in xml_dict2['params']['param']:
print(f"--- {bJ1['@name']}")
except:
print("It no worky")
Edit: By request... The output I have been able to get is:
Name: BATCHJOB1
Classname: batchJob1
... (etc)
My end goal is to take the output and put it into column format something like this:
Name Classname ...
BATCHJOB1 batchJob1
"N/A" would be placed where the element does not exist or has no value.
CodePudding user response:
xmltodict
is only returning a dict when it is one parameter, but a list when it is two or more. There is a force_list
parameter to .parse
that allows keys to be indicated that should always be lists.
You could use:
xml_dict1 = xml.parse(raw_file, force_list=('param',))['batchConfiguration']['batchJob']
Then:
try:
for p in bJ['params']['param']:
print(f"Params: {p['@name']} - {p['#text']}")
except KeyError: # recommend never use bare 'except'
print("Params: N/A")
CodePudding user response:
If I understand you correctly, this can be accomplished by using pandas.read_xml():
import pandas as pd
pd.read_xml([your_xml]).iloc[:,0:2]
Output, based on your sample xml:
name className
0 BATCHJOB1 batchJob1
1 BATCHJOB2 batchJob2
2 BATCHJOB3 classyBatchJob