my ad-hoc question relates to the following
given this xml structure (saved in xml_FILE):
<countriesAndStates>
<countries>
<name>USA</name>
<states>
<active>true</active>
<stateName>Colorado</stateName>
<isoCode>CO</isoCode>
</states>
<states>
<active>false</active>
<stateName>Florida</stateName>
<isoCode>FL</isoCode>
</states>
</countries>
</countriesAndStates>
I traverse the state element level via this for-loop and save the result to a dictionary of lists through the help of collections from defaultdict as follows:
from collections import defaultdict
tree = ET.parse(xml_FILE)
root = tree.getroot()
dict_of_list = defaultdict(list)
for key in root.findall("./countriesAndStates/"
"countries/"
"states/"):
dict_of_list[key.tag].append(key.text)
I then transform this dict to a dataframe and I'll have all the tuples containing the state element's data, cf.:
df = pd.DataFrame(dict_of_list)
print(df)
This way I get the following dataframe output (scheme tuple):
active stateName isoCode
0 true Colorado CO
However, I want to have the country for each state tuple such that each tuple/row in dataframe will transform to:
name active stateName isoCode
0 USA true Colorado CO
In other words: for each state/record I want the country name too. How can I achieve this?
Thank you in advance.
CodePudding user response:
something like this
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<countriesAndStates>
<countries>
<name>USA</name>
<states>
<active>true</active>
<stateName>Colorado</stateName>
<isoCode>CO</isoCode>
</states>
<states>
<active>false</active>
<stateName>Florida</stateName>
<isoCode>FL</isoCode>
</states>
</countries>
</countriesAndStates>'''
data = []
root = ET.fromstring(xml)
for country in root.findall('.//countries'):
name = country.find('name').text
for state in country.findall('states'):
data.append({'name':name})
for e in list(state):
data[-1][e.tag] = e.text
df = pd.DataFrame(data)
print(df)
output
name active stateName isoCode
0 USA true Colorado CO
1 USA false Florida FL