I need to extract the nodes which are having RNAME- Applicable and their corresponding accounts.
<Reports>
<Report>
<ID>A1234</ID>
<RNAME>Not Applicable</RNAME>
<Accounts>
<Account>
<AID>12345</AID>
<A2>TEST1</A2>
</Account>
</Accounts>
</Report>
<Report>
<ID>A12345</ID>
<RNAME>Applicable</RNAME>
<Accounts>
<Account>
<AID>123456789</AID>
<A2>TEST2</A2>
</Account>
</Accounts>
</Report>
</Reports>
Here is the expected output for data frame
ID | AID | A2 |
---|---|---|
A12345 | 123456789 | TEST2 |
Here is my code:
for x in ET.fromstring(sample.xml).findall(".//Report"):
RNAME = x.find("RNAME").text
if RNAME != "Not Applicable":
ID = x.find("ID").text print(ID) # It is printing A12345
for y in ET.fromstring(content).findall('.//Report/Accounts/Account'):
AID = y.find("AID").text
A2 = y.find("A2").text
print(AID, A2) # it is printing all 12345,TEST1 and 12345678 ,TEST2
CodePudding user response:
first install xmltodict
pip install xmltodict
then convert xml into json
import json
import xmltodict
with open("test.xml") as xml_file:
data_dict = xmltodict.parse(xml_file.read())
json_data = json.dumps(data_dict)
### here you have the data as json you can easily deal with them and write them to csv file ###
CodePudding user response:
Or you can try parsing the XML like this:
import pandas as pd
import xml.etree.ElementTree as ET
root = ET.fromstring('<Reports> <Report> <ID>A1234</ID> <RNAME>Not Applicable</RNAME> <Accounts> <Account> <AID >12345</AID > <A2>TEST1</A2> </Account> </Accounts> </Report> <Report> <ID>A12345</ID> <RNAME>Applicable</RNAME> <Accounts> <Account> <AID >123456789</AID > <A2>TEST2</A2> </Account> </Accounts> </Report> </Reports>')
mydata = pd.DataFrame(data = {'id': [r.text for r in root.findall('.//ID')],
'aid': [r.text for r in root.findall('.//AID')],
'a2': [r.text for r in root.findall('.//A2')]})
mydata.to_csv(path_here)