I'm looking for some insights on how to properly display this xml:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT>
<SUPPLIER>015</SUPPLIER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<REFERENCE>Class01</REFERENCE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</PRODUCT>
</HEADER>
For a simpler structure, that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
I've written a few lines that look like this:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('file.xml')
root = tree.getroot()
df = pd.DataFrame()
for i in range(0, len(root), 2):
details = [(child.tag, child.text) for child in root[i 0]]
features = [(child[0].text, child[1].text) for child in root[i 1]]
temp_df = pd.DataFrame([[i[1] for i in details features]], columns=[i[0] for i in details features])
df = pd.concat([df, temp_df])
df
# df.to_csv("file_export.csv", index=False)
... and produce this output:
KEYWORD PRODUCT_TYPE Colour
0 Paper major white
What edits do I need to make, to output this:
SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0 015 Paper major Class01 white
Thank you for your help!
Best, ~C
CodePudding user response:
The below will do the job
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT>
<SUPPLIER>015</SUPPLIER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<REFERENCE>Class01</REFERENCE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</PRODUCT>
</HEADER>'''
elements = ['SUPPLIER','KEYWORD','PRODUCT_TYPE','REFERENCE','FNAME','FVALUE']
root = ET.fromstring(xml)
data = {e:root.find(f'.//{e}').text for e in elements}
data[data['FNAME']] = data['FVALUE']
del data['FVALUE']
del data['FNAME']
df = pd.DataFrame([data])
print(df)
output
SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0 015 Paper major Class01 white