Home > Enterprise >  How do I correctly display xml structure with pandas?
How do I correctly display xml structure with pandas?

Time:04-05

I'm looking for some insights on how to properly display this xml:

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <PRODUCT>
        <SUPPLIER>015</SUPPLIER>
        <PRODUCT_DETAILS>
            <KEYWORD>Paper</KEYWORD>
            <PRODUCT_TYPE>major</PRODUCT_TYPE>
        </PRODUCT_DETAILS>
        <PRODUCT_FEATURES>
            <REFERENCE>Class01</REFERENCE>
            <FEATURE>
                <FNAME>Colour</FNAME>
                <FVALUE>white</FVALUE>
            </FEATURE>
        </PRODUCT_FEATURES>
    </PRODUCT>
</HEADER>

For a simpler structure, that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <PRODUCT_DETAILS>
        <KEYWORD>Paper</KEYWORD>
        <PRODUCT_TYPE>major</PRODUCT_TYPE>
    </PRODUCT_DETAILS>
    <PRODUCT_FEATURES>
        <FEATURE>
            <FNAME>Colour</FNAME>
            <FVALUE>white</FVALUE>
        </FEATURE>
    </PRODUCT_FEATURES>
</HEADER>

I've written a few lines that look like this:

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse('file.xml')
root = tree.getroot()

df = pd.DataFrame()

for i in range(0, len(root), 2):

    details = [(child.tag, child.text) for child in root[i   0]]
    features = [(child[0].text, child[1].text) for child in root[i   1]]

    temp_df = pd.DataFrame([[i[1] for i in details   features]], columns=[i[0] for i in details   features])

    df = pd.concat([df, temp_df])

df

# df.to_csv("file_export.csv", index=False)

... and produce this output:

    KEYWORD PRODUCT_TYPE    Colour
0   Paper   major           white

What edits do I need to make, to output this:

    SUPPLIER    KEYWORD PRODUCT_TYPE    REFERENCE   Colour
0   015         Paper   major           Class01     white

Thank you for your help!

Best, ~C

CodePudding user response:

The below will do the job

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <PRODUCT>
        <SUPPLIER>015</SUPPLIER>
        <PRODUCT_DETAILS>
            <KEYWORD>Paper</KEYWORD>
            <PRODUCT_TYPE>major</PRODUCT_TYPE>
        </PRODUCT_DETAILS>
        <PRODUCT_FEATURES>
            <REFERENCE>Class01</REFERENCE>
            <FEATURE>
                <FNAME>Colour</FNAME>
                <FVALUE>white</FVALUE>
            </FEATURE>
        </PRODUCT_FEATURES>
    </PRODUCT>
</HEADER>'''

elements = ['SUPPLIER','KEYWORD','PRODUCT_TYPE','REFERENCE','FNAME','FVALUE']
root = ET.fromstring(xml)
data = {e:root.find(f'.//{e}').text for e in elements}
data[data['FNAME']] = data['FVALUE']
del data['FVALUE']
del data['FNAME']

df = pd.DataFrame([data])
print(df)

output

  SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0      015   Paper        major   Class01  white
  • Related