this is the portion of xml the file I am interested:
</Section>
<Section id="21" name="Event Strips" itemsCount="18">
<Event title="HR Min" start="10707646" end="10709446"/>
<Event title="HR Max" start="1043646" end="1045446"/>
<Event title="RR Min" start="12441170" end="12442970"/>
<Event title="RR Max" start="14690429" end="14692229"/>
</Section>
<Section id="99" name="TimeDomainHRV" itemsCount="4">
<Info id="100" name="RRMean" value="725.99 ms"/>
<Info id="101" name="SDNN" value="108.01 ms"/>
</Section>
I want a dataframe like below from the xml:
Event start end
HR Min 10707646 10709446
HR Max 1043646 1045446
..........................
CodePudding user response:
You cat just use read_xml after install lxml libruary.
- Install lxml lib
pip install lxml
- Load xml data in dataframe
import pandas as pd pd.read_xml('data.xml')
CodePudding user response:
import xml.etree.ElementTree as ET
import xmltodict
import pandas as pd
import re
import glob
import os
for file in glob.glob("test_xml\\*.xml", recursive=False):
file_name_base = os.path.splitext(file)[-2]
file_name = file_name_base.split('\\')[-1]
file_name = file_name.split('.')[-2]
with open(file) as f:
xml = f.read()
tree = ET.fromstring(re.sub(r"(<\?xml[^>] \?>)", r"\1<root>", xml) "</root>")
root = tree
# tree = ET.parse('xml')
xml_data = tree
# here you can change the encoding type to be able to set it to the one you need
xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml')
data_dict = dict(xmltodict.parse(xmlstr))
store_items = []
all_items = []
for storeno in root.iter('Event'):
alarms_name = storeno.attrib.get('title')
start = storeno.attrib.get('start')
end = storeno.attrib.get('end')
store_items = [alarms_name, start, end]
all_items.append(store_items)
xmlToDf = pd.DataFrame(all_items, columns=['Alarms', 'Start', 'End'])
print('Creating reference file for ' file_name '........................')
print(xmlToDf.to_string(index=False))
xmlToDf.to_csv(file_name '.ref', header=False, index_label=None, sep=';', index=False)