I need help in converting XML to CSV files, I success in part but I don't know how to add the time value and phase id to python code.
I have the following XML that is copy from XML link :
<?xml version="1.0" encoding="UTF-8"?>
<akouda>
<time value="2022-04-12 13:45:00">
<phases>
<phase id="0">
<act_energy>1.2000000000000455</act_energy>
<react_energy>1.9711529080673937</react_energy>
<current_inst>7.08</current_inst>
<voltage_inst>242.7</voltage_inst>
<power_inst>0.9</power_inst>
<power_fact>0.52</power_fact>
<thd>66.45</thd>
</phase>
<phase id="1">
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>16.1</current_inst>
<voltage_inst>242</voltage_inst>
<power_inst>2.38</power_inst>
<power_fact>0.61</power_fact>
<thd>31</thd>
</phase>
<phase id="2">
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>8.64</current_inst>
<voltage_inst>242.7</voltage_inst>
<power_inst>2.01</power_inst>
<power_fact>0.95</power_fact>
<thd>26.81</thd>
</phase>
</phases>
</time>
<time value="2022-04-12 13:30:00">
<phases>
<phase id="0">
<act_energy>1.2999999999999545</act_energy>
<react_energy>2.1354156504061876</react_energy>
<current_inst>7.06</current_inst>
<voltage_inst>242.2</voltage_inst>
<power_inst>0.9</power_inst>
<power_fact>0.52</power_fact>
<thd>65.89</thd>
</phase>
<phase id="1">
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>16.95</current_inst>
<voltage_inst>241</voltage_inst>
<power_inst>2.61</power_inst>
<power_fact>0.63</power_fact>
<thd>29.1</thd>
</phase>
<phase id="2">
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>9.57</current_inst>
<voltage_inst>242.4</voltage_inst>
<power_inst>2.23</power_inst>
<power_fact>0.96</power_fact>
<thd>24.12</thd>
</phase>
</phases>
</time>
</akouda>
and the following code to convert XML to CSV :
import xml.etree.ElementTree as Xet
import pandas as pd
rows = []
# Parsing the XML file
xmlparse = Xet.parse('sample.xml')
root = xmlparse.getroot()
for i in root.findall('phases'):
act_energy = i.find("act_energy").text
react_energy = i.find("react_energy").text
current_inst = i.find("current_inst").text
voltage_inst = i.find("voltage_inst").text
power_inst = i.find("power_inst").text
power_fact = i.find("power_fact").text
thd = i.find("thd").text
rows.append({
"act_energy": act_energy,
"react_energy": react_energy,
"current_inst": current_inst,
"voltage_inst": voltage_inst,
"power_inst": power_inst,
"power_fact": power_fact,
"thd": thd,
})
df = pd.DataFrame(rows )
# Writing dataframe to csv
df.to_csv('output.csv')
- How I can include the time value and phase id in the python code?
- How to insert the XML from the link, not from a file?
Thanks
CodePudding user response:
You can use pd.read_xml
function with proper XPath, for example (you can supply URL to the .read_xml()
function as well):
df = pd.read_xml("data.xml", xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))
Prints:
value id act_energy react_energy current_inst voltage_inst power_inst power_fact thd
1 2022-04-12 13:45:00 0.0 1.2 1.971153 7.08 242.7 0.90 0.52 66.45
2 2022-04-12 13:45:00 1.0 0.0 0.000000 16.10 242.0 2.38 0.61 31.00
3 2022-04-12 13:45:00 2.0 0.0 0.000000 8.64 242.7 2.01 0.95 26.81
5 2022-04-12 13:30:00 0.0 1.3 2.135416 7.06 242.2 0.90 0.52 65.89
6 2022-04-12 13:30:00 1.0 0.0 0.000000 16.95 241.0 2.61 0.63 29.10
7 2022-04-12 13:30:00 2.0 0.0 0.000000 9.57 242.4 2.23 0.96 24.12
EDIT: To read from provided URL:
import requests
import pandas as pd
from html import unescape
url = "https://issat.ttn.tn/cu/export/akouda.php"
# quick-and-dirty method to remove first <pre> and last </pre>
# ideally, you will do this with html parser:
s = unescape(requests.get(url).text)[5:-6]
df = pd.read_xml(s, xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))