I have data in xml file and I am reading 3 columns : price , name , calories
xml data
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<price>$5.95</price>
<name>Belgian Waffles</name>
<desc>Two of our famous Belgian Waffles with plenty of real maple syrup</desc>
<calories>650</calories>
</row>
<row>
<index>1</index>
<price>$7.95</price>
<name>Strawberry Belgian Waffles</name>
<desc>Light Belgian waffles covered with strawberries and whipped cream</desc>
<calories>900</calories>
</row>
<row>
<index>2</index>
<price>$8.95</price>
<name>Berry-Berry Belgian Waffles</name>
<desc>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</desc>
<calories>900</calories>
</row>
<row>
<index>3</index>
<price>$4.50</price>
<name>French Toast</name>
<desc>Thick slices made from our homemade sourdough bread</desc>
<calories>600</calories>
</row>
<row>
<index>4</index>
<price>$6.95</price>
<name>Homestyle Breakfast</name>
<desc>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</desc>
<calories>950</calories>
</row>
</data>
Code :
import xml.etree.ElementTree as ET
parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()
for data in get_root_element.findall('row'):
prc = data.find('price')
nm = data.find('name')
cal = data.find('calories')
temp = prc.text ',' nm.text ',' cal.text
print(temp)
The above code gives me data but need to store this data to csv file
How do I need to write logic for this. Is it possible to do with pandas / csv
Need to add my headers as well to that csv file
Headers : price , name , calories
CodePudding user response:
Solution of @kiric8494 is good enough, you can stay with it. You can also implement it using csv.DictWriter
which will be quite shorter:
import xml.etree.ElementTree as ET
from csv import DictWriter
parse_xml = ET.parse(r"/content/sample_data/xyz.xml")
root = parse_xml.getroot()
with open(r"/content/sample_data/abc.csv", "w", newline="") as f:
writer = DictWriter(f, fieldnames=("price", "name", "calories"), extrasaction="ignore")
writer.writeheader()
writer.writerows({e.tag: e.text for e in row} for row in root)
Basically we set up DictWriter
to ignore all fields except price
, name
and calories
and then pass generator to .writerows()
which construct dictionary of all child nodes of <row>
where key is tag and value is text.
You can help my country, check my profile info.
CodePudding user response:
Thanks for the solution @ewz93
I have done in following way
import xml.etree.ElementTree as ET
import csv
parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()
final_data_set = [] # [ [],[],[] ]
for data in get_root_element.findall('row'):
temp = []
prc = data.find('price')
nm = data.find('name')
cal = data.find('calories')
temp=[prc.text ',' nm.text ',' cal.text]
final_data_set.append(temp)
headers = ['price','name','calories']
with open('/content/sample_data/abc.csv','w') as wr:
csv_wr = csv.writer(wr)
csv_wr.writerow(headers)
for elem in final_data_set:
for item in elem:
csv_wr.writerow(item.split(','))
CodePudding user response:
I would just put the values in lists and create a DataFrame from that:
import xml.etree.ElementTree as ET
import pandas as pd
parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()
prc_list = []
nm_list = []
cal_list = []
for data in get_root_element.findall('row'):
prc_list.append(data.find('price'))
nm_list.append(data.find('name'))
cal_list.append(data.find('calories'))
df = pd.DataFrame({"price": prc_list, "name": nm_list, "calories": cal_list})
df.to_excel("your_file_name.xlsx") # or if you really want a CSV use df.to_csv("your_file_name.xlsx")
This is probably not the prettiest solution because there is also pandas.read_xml() so you probably could make it shorter and avoid having to use etree by directly reading the XML into a DataFrame and then directly write that to CSV.