xml = '''<?xml version='1.0' encoding='utf-8'?>
<doc:data xmlns:doc="https://example.com">
<doc:row>
<doc:shape value="triangle" />
<doc:degrees value="180" />
<doc:sides value="3.0"/>
</doc:row>
<doc:row>
<doc:shape value="triangle" />
<doc:degrees value="180" />
<doc:sides value="3.0"/>
</doc:row>
<doc:row>
<doc:shape value="triangle" />
<doc:degrees value="180" />
<doc:sides value="3.0"/>
</doc:row>
</doc:data>'''
df = pd.read_xml(xml,
xpath="//doc:row",
namespaces={"doc": "https://example.com"})
print(df)
I am getting the output as follows:
shape degrees sides
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
Th expected output is:
shape degrees sides
0 triangle 180 3.0
1 triangle 180 3.0
2 triangle 180 3.0
The values for each tag are present in the "value = ".Had it not been in the value tag then the data is loading properly. please help in getting the respective values for each in the above xml.
CodePudding user response:
If you know the columns beforehand you can use kwarg iterparse
instead of xpath
:
df = pd.read_xml("example.xml",
iterparse = {"row": ["value", "value", "value"]},
names = ["shape", "degrees", "sides"]
)
Output:
shape degrees sides
0 triangle 180 3.0
1 triangle 180 3.0
2 triangle 180 3.0
Edit: the above solution isn't robust at all since messing with the order of the subelements will mess up the data (problem here being the identical attribute names value
of the subelements). If the order might change, you can still build your columns one after the other and concatenate them:
df = pd.concat([pd.read_xml("example.xml",
iterparse = {name: ["value"]},
names = [name])
for name in ["shape", "degrees", "sides"]
], axis=1
)
No idea how it would perform on bigger file though...