<avis>
<numeroseao>1331795</numeroseao>
<numero>61628-3435560</numero>
<organisme>Ville de Québec</organisme>
<fournisseurs>
<fournisseur>
<nomorganisation>APEL ASSOCIATION POUT DU LA MARAISNORD</nomorganisation>
<adjudicataire>1</adjudicataire>
<montantsoumis>0.000000</montantsoumis>
<montantssoumisunite>0</montantssoumisunite>
<montantcontrat>89732.240000</montantcontrat>
<montanttotalcontrat>0.000000</montanttotalcontrat>
</fournisseur>
</fournisseurs>
</avis>
So there is avis, avis has fournisseurs, fournisseurs has further nodes. How do I get these values to a data frame?
I am using the below code
element_tree = ET.parse('D:\\python_script\\temp2\\AvisRevisions_20200201_20200229.xml')
root = element_tree.getroot()
for child in root.findall('.//avis/*/*/*'):
or
for child in root.findall('.//avis/*'):
but it only gets me either the parent nodes or child nodes, not all of them.
CodePudding user response:
Since your data is not flat it might cause issues when you import the xml directly into pandas. In this scenario a library like pandas_read_xml
might be useful:
import pandas_read_xml as pdx
df = pdx.read_xml(xml)
df = pdx.fully_flatten(df) # this should get you the structure you want
In the line above, the xml variable is your "AvisRevisions_20200201_20200229.xml" file.
For a flatter structure you could use Pandas as such:
import pandas as pd
df = pd.read_xml(xml, xpath="//fournisseurs")
If you are looking for the entire "avis" section, you can replace the xpath parameter by:
df = pd.read_xml(xml, xpath="//avis")
From this, pandas should create the dataframe with appropriate columns. Here is a link to the Pandas docs.
CodePudding user response:
Try the below
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<avis>
<numeroseao>1331795</numeroseao>
<numero>61628-3435560</numero>
<organisme>Ville de Québec</organisme>
<fournisseurs>
<fournisseur>
<nomorganisation>APEL ASSOCIATION POUT DU LA MARAISNORD</nomorganisation>
<adjudicataire>1</adjudicataire>
<montantsoumis>0.000000</montantsoumis>
<montantssoumisunite>0</montantssoumisunite>
<montantcontrat>89732.240000</montantcontrat>
<montanttotalcontrat>0.000000</montanttotalcontrat>
</fournisseur>
</fournisseurs>
</avis>'''
root = ET.fromstring(xml)
data = []
fournisseur = root.find('.//fournisseur')
data.append({e.tag:e.text for e in fournisseur})
df = pd.DataFrame(data)