I have following toy example code to read an xml using pandas
xml = '''<?xml version='1.0' encoding='utf-8'?>
<data>
<d>10</d>
<d>20</d>
<d>11</d>
<d>2</d>
<d>5</d>
</data>'''
import pandas as pd
df = pd.read_xml(xml,xpath="//data")
Result
>>> df
d
0 5
It seems it only reads the last data item.
Expected result
However I expect the following DataFrame
as result:
>>> df
d
0 10
1 20
2 11
3 2
4 5
Question
Is it possible to read all the d
tags in a single DataFrame
using pd.read_xml
?
CodePudding user response:
That's not a use case supported by pd.read_xml
. It works best if the XPath results in a list of nodes of the form:
<row>
<col1>...</col1>
<col2>...</col2>
<col3>...</col3>
</row>
You can parse the XML manually:
from xml.etree import ElementTree
xml = '''<?xml version='1.0' encoding='utf-8'?>
<data>
<d>10</d>
<d>20</d>
<d>11</d>
<d>2</d>
<d>5</d>
</data>'''
tree = ElementTree.fromstring(xml)
df = pd.DataFrame({
'd': [int(d.text) for d in tree.findall(".//d")]
})