how to add all columns (XML to CSV)-CodePudding

I need help in converting XML to CSV files, I success in part but I don't know how to add the time value and phase id to python code.

I have the following XML that is copy from XML link :

<?xml version="1.0" encoding="UTF-8"?>
 <akouda>   
    <time value="2022-04-12 13:45:00">
        <phases>
            <phase id="0">          
                <act_energy>1.2000000000000455</act_energy>
                <react_energy>1.9711529080673937</react_energy>
                <current_inst>7.08</current_inst>
                <voltage_inst>242.7</voltage_inst>
                <power_inst>0.9</power_inst>
                <power_fact>0.52</power_fact>
                <thd>66.45</thd>
                </phase>
            <phase id="1">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>16.1</current_inst>
                <voltage_inst>242</voltage_inst>
                <power_inst>2.38</power_inst>
                <power_fact>0.61</power_fact>
                <thd>31</thd>
                </phase>
            <phase id="2">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>8.64</current_inst>
                <voltage_inst>242.7</voltage_inst>
                <power_inst>2.01</power_inst>
                <power_fact>0.95</power_fact>
                <thd>26.81</thd>
                </phase>
            </phases>
     </time>
<time value="2022-04-12 13:30:00">
        <phases>
            <phase id="0">          
                <act_energy>1.2999999999999545</act_energy>
                <react_energy>2.1354156504061876</react_energy>
                <current_inst>7.06</current_inst>
                <voltage_inst>242.2</voltage_inst>
                <power_inst>0.9</power_inst>
                <power_fact>0.52</power_fact>
                <thd>65.89</thd>
                </phase>
            <phase id="1">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>16.95</current_inst>
                <voltage_inst>241</voltage_inst>
                <power_inst>2.61</power_inst>
                <power_fact>0.63</power_fact>
                <thd>29.1</thd>
                </phase>
            <phase id="2">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>9.57</current_inst>
                <voltage_inst>242.4</voltage_inst>
                <power_inst>2.23</power_inst>
                <power_fact>0.96</power_fact>
                <thd>24.12</thd>
                </phase>
            </phases>
     </time>
    </akouda>

and the following code to convert XML to CSV :

    import xml.etree.ElementTree as Xet
import pandas as pd

rows = []

# Parsing the XML file
xmlparse = Xet.parse('sample.xml')
root = xmlparse.getroot()
for i in root.findall('phases'):
    act_energy = i.find("act_energy").text
    react_energy = i.find("react_energy").text
    current_inst = i.find("current_inst").text
    voltage_inst = i.find("voltage_inst").text
    power_inst = i.find("power_inst").text
    power_fact = i.find("power_fact").text
    thd = i.find("thd").text


    rows.append({
                "act_energy": act_energy,
                "react_energy": react_energy,
                "current_inst": current_inst,
                "voltage_inst": voltage_inst,
                "power_inst": power_inst,
                "power_fact": power_fact,
                "thd": thd,
                })

df = pd.DataFrame(rows )

# Writing dataframe to csv
df.to_csv('output.csv')

How I can include the time value and phase id in the python code?
How to insert the XML from the link, not from a file?

Thanks

CodePudding user response：

You can use pd.read_xml function with proper XPath, for example (you can supply URL to the .read_xml() function as well):

df = pd.read_xml("data.xml", xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))

Prints:

                 value   id  act_energy  react_energy  current_inst  voltage_inst  power_inst  power_fact    thd
1  2022-04-12 13:45:00  0.0         1.2      1.971153          7.08         242.7        0.90        0.52  66.45
2  2022-04-12 13:45:00  1.0         0.0      0.000000         16.10         242.0        2.38        0.61  31.00
3  2022-04-12 13:45:00  2.0         0.0      0.000000          8.64         242.7        2.01        0.95  26.81
5  2022-04-12 13:30:00  0.0         1.3      2.135416          7.06         242.2        0.90        0.52  65.89
6  2022-04-12 13:30:00  1.0         0.0      0.000000         16.95         241.0        2.61        0.63  29.10
7  2022-04-12 13:30:00  2.0         0.0      0.000000          9.57         242.4        2.23        0.96  24.12

EDIT: To read from provided URL:

import requests
import pandas as pd
from html import unescape

url = "https://issat.ttn.tn/cu/export/akouda.php"

# quick-and-dirty method to remove first <pre> and last </pre>
# ideally, you will do this with html parser:
s = unescape(requests.get(url).text)[5:-6]

df = pd.read_xml(s, xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))