Home > front end >  Creating python dataframe from nested xml
Creating python dataframe from nested xml

Time:10-29

this is the portion of xml the file I am interested:

</Section>
<Section id="21" name="Event Strips" itemsCount="18">
    <Event title="HR Min" start="10707646" end="10709446"/>
    <Event title="HR Max" start="1043646" end="1045446"/>
    <Event title="RR Min" start="12441170" end="12442970"/>
    <Event title="RR Max" start="14690429" end="14692229"/>
</Section>
<Section id="99" name="TimeDomainHRV" itemsCount="4">
    <Info id="100" name="RRMean" value="725.99 ms"/>
    <Info id="101" name="SDNN" value="108.01 ms"/>
</Section>

I want a dataframe like below from the xml:

Event    start    end
HR Min 10707646 10709446
HR Max 1043646  1045446 
..........................

CodePudding user response:

You cat just use read_xml after install lxml libruary.

  1. Install lxml lib
pip install lxml
  1. Load xml data in dataframe
import pandas as pd

pd.read_xml('data.xml')

CodePudding user response:

import xml.etree.ElementTree as ET
import xmltodict
import pandas as pd
import re
import glob
import os

for file in glob.glob("test_xml\\*.xml", recursive=False):
    file_name_base = os.path.splitext(file)[-2]
    file_name = file_name_base.split('\\')[-1]
    file_name = file_name.split('.')[-2]
    with open(file) as f:
        xml = f.read()
    tree = ET.fromstring(re.sub(r"(<\?xml[^>] \?>)", r"\1<root>", xml)   "</root>")
    root = tree

    # tree = ET.parse('xml')
    xml_data = tree
    # here you can change the encoding type to be able to set it to the one you need
    xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml')

    data_dict = dict(xmltodict.parse(xmlstr))
    store_items = []
    all_items = []

    for storeno in root.iter('Event'):
        alarms_name = storeno.attrib.get('title')
        start = storeno.attrib.get('start')
        end = storeno.attrib.get('end')

        store_items = [alarms_name, start, end]
        all_items.append(store_items)

    xmlToDf = pd.DataFrame(all_items, columns=['Alarms', 'Start', 'End'])
    print('Creating reference file for '   file_name   '........................')
    print(xmlToDf.to_string(index=False))
    xmlToDf.to_csv(file_name   '.ref', header=False, index_label=None, sep=';', index=False)

  • Related