Home > Software engineering >  Getting text of parent, child and their child
Getting text of parent, child and their child

Time:11-28

<avis>
<numeroseao>1331795</numeroseao>
<numero>61628-3435560</numero>
<organisme>Ville de Québec</organisme>
<fournisseurs>
  <fournisseur>
    <nomorganisation>APEL ASSOCIATION POUT DU LA MARAISNORD</nomorganisation>
    <adjudicataire>1</adjudicataire>
    <montantsoumis>0.000000</montantsoumis>
    <montantssoumisunite>0</montantssoumisunite>
    <montantcontrat>89732.240000</montantcontrat>
    <montanttotalcontrat>0.000000</montanttotalcontrat>
  </fournisseur>
</fournisseurs>
</avis>

So there is avis, avis has fournisseurs, fournisseurs has further nodes. How do I get these values to a data frame?

I am using the below code

element_tree = ET.parse('D:\\python_script\\temp2\\AvisRevisions_20200201_20200229.xml')
root = element_tree.getroot()
for child in root.findall('.//avis/*/*/*'):

or

for child in root.findall('.//avis/*'):

but it only gets me either the parent nodes or child nodes, not all of them.

CodePudding user response:

Since your data is not flat it might cause issues when you import the xml directly into pandas. In this scenario a library like pandas_read_xml might be useful:

import pandas_read_xml as pdx

df = pdx.read_xml(xml)
df = pdx.fully_flatten(df)  # this should get you the structure you want

In the line above, the xml variable is your "AvisRevisions_20200201_20200229.xml" file.


For a flatter structure you could use Pandas as such:

import pandas as pd

df = pd.read_xml(xml, xpath="//fournisseurs")

If you are looking for the entire "avis" section, you can replace the xpath parameter by:

df = pd.read_xml(xml, xpath="//avis")

From this, pandas should create the dataframe with appropriate columns. Here is a link to the Pandas docs.

CodePudding user response:

Try the below

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<avis>
<numeroseao>1331795</numeroseao>
<numero>61628-3435560</numero>
<organisme>Ville de Québec</organisme>
<fournisseurs>
  <fournisseur>
    <nomorganisation>APEL ASSOCIATION POUT DU LA MARAISNORD</nomorganisation>
    <adjudicataire>1</adjudicataire>
    <montantsoumis>0.000000</montantsoumis>
    <montantssoumisunite>0</montantssoumisunite>
    <montantcontrat>89732.240000</montantcontrat>
    <montanttotalcontrat>0.000000</montanttotalcontrat>
  </fournisseur>
</fournisseurs>
</avis>'''
root = ET.fromstring(xml)

data = []
fournisseur = root.find('.//fournisseur')
data.append({e.tag:e.text for e in fournisseur})
df = pd.DataFrame(data)
  • Related