How can I export the XML file structure into pandas-CodePudding

the structure of the code is as shown below this is an xml file

<ROOT>
    <data>
        <record>
             <field name="Country or Area">Afghanistan</field>
            <field name="Year">2020</field>
            <field name="Item">Gross Domestic Product (GDP)</field>
            <field name="Value">508.453721937094</field>
        </record>
         <record>
            <field name="Country or Area">Afghanistan</field>
             <field name="Year">2019</field>
             <field name="Item">Gross Domestic Product (GDP)</field>
             <field name="Value">496.940552822825</field>
         </record>
      </data>  </ROOT>

I have tried, i've tried other methods but no luck


   from lxml import objectify

   xml = objectify.parse('GDP_pc.xml')
   root = xml.getroot()

   data=[]
   for i in range(len(root.getchildren())):
       data.append([child.text for child in root.getchildren()[i].getchildren()])

   df = pd.DataFrame(data)
   df.columns = ['Country or Area', 'Year', 'Item', 'Value',]

CodePudding user response：

Have you tried the pandas method pd.read_xml()?

It reads and transform a xml file into a dataframe.

Just to the following:

df = pd.read_xml('GDP_pc.xml')

You can read more about it on the official documentation

CodePudding user response：

See below

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<ROOT>
    <data>
        <record>
             <field name="Country or Area">Afghanistan</field>
            <field name="Year">2020</field>
            <field name="Item">Gross Domestic Product (GDP)</field>
            <field name="Value">508.453721937094</field>
        </record>
         <record>
            <field name="Country or Area">Afghanistan</field>
             <field name="Year">2019</field>
             <field name="Item">Gross Domestic Product (GDP)</field>
             <field name="Value">496.940552822825</field>
         </record>
      </data> 
</ROOT>'''

data = []
root = ET.fromstring(xml)
for rec in root.findall('.//record'):
    data.append({field.attrib['name']: field.text for field in rec.findall('field')})
df = pd.DataFrame(data)
print(df)

output

Country or Area  Year                          Item             Value
0     Afghanistan  2020  Gross Domestic Product (GDP)  508.453721937094
1     Afghanistan  2019  Gross Domestic Product (GDP)  496.940552822825