Home > Software engineering >  How to read specific data and write to csv file
How to read specific data and write to csv file

Time:06-26

I have data in xml file and I am reading 3 columns : price , name , calories

xml data


<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <price>$5.95</price>
    <name>Belgian Waffles</name>
    <desc>Two of our famous Belgian Waffles with plenty of real maple syrup</desc>
    <calories>650</calories>
  </row>
  <row>
    <index>1</index>
    <price>$7.95</price>
    <name>Strawberry Belgian Waffles</name>
    <desc>Light Belgian waffles covered with strawberries and whipped cream</desc>
    <calories>900</calories>
  </row>
  <row>
    <index>2</index>
    <price>$8.95</price>
    <name>Berry-Berry Belgian Waffles</name>
    <desc>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</desc>
    <calories>900</calories>
  </row>
  <row>
    <index>3</index>
    <price>$4.50</price>
    <name>French Toast</name>
    <desc>Thick slices made from our homemade sourdough bread</desc>
    <calories>600</calories>
  </row>
  <row>
    <index>4</index>
    <price>$6.95</price>
    <name>Homestyle Breakfast</name>
    <desc>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</desc>
    <calories>950</calories>
  </row>
</data>

Code :

import xml.etree.ElementTree as ET

parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()


for data in get_root_element.findall('row'):

  prc = data.find('price')
  nm  = data.find('name')
  cal = data.find('calories')
  temp = prc.text   ','    nm.text   ','   cal.text
  print(temp)

The above code gives me data but need to store this data to csv file

How do I need to write logic for this. Is it possible to do with pandas / csv

Need to add my headers as well to that csv file

Headers : price , name , calories

CodePudding user response:

Solution of @kiric8494 is good enough, you can stay with it. You can also implement it using csv.DictWriter which will be quite shorter:

import xml.etree.ElementTree as ET
from csv import DictWriter

parse_xml = ET.parse(r"/content/sample_data/xyz.xml")
root = parse_xml.getroot()
with open(r"/content/sample_data/abc.csv", "w", newline="") as f:
    writer = DictWriter(f, fieldnames=("price", "name", "calories"), extrasaction="ignore")
    writer.writeheader()
    writer.writerows({e.tag: e.text for e in row} for row in root)

Basically we set up DictWriter to ignore all fields except price, name and calories and then pass generator to .writerows() which construct dictionary of all child nodes of <row> where key is tag and value is text.


You can help my country, check my profile info.

CodePudding user response:

Thanks for the solution @ewz93

I have done in following way

import xml.etree.ElementTree as ET
import csv

parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()

final_data_set = [] # [ [],[],[] ]

for data in get_root_element.findall('row'):
  temp = []
  prc = data.find('price')
  nm  = data.find('name')
  cal = data.find('calories')
  temp=[prc.text   ','    nm.text    ','   cal.text]
  final_data_set.append(temp)

headers = ['price','name','calories']

with open('/content/sample_data/abc.csv','w') as wr:
  csv_wr = csv.writer(wr)
  csv_wr.writerow(headers)
  for elem in final_data_set:
    for item in elem:
      csv_wr.writerow(item.split(','))

CodePudding user response:

I would just put the values in lists and create a DataFrame from that:

import xml.etree.ElementTree as ET
import pandas as pd

parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()

prc_list = []
nm_list = []
cal_list = []

for data in get_root_element.findall('row'):
  prc_list.append(data.find('price'))
  nm_list.append(data.find('name'))
  cal_list.append(data.find('calories'))

df = pd.DataFrame({"price": prc_list, "name": nm_list, "calories": cal_list})
df.to_excel("your_file_name.xlsx") # or if you really want a CSV use df.to_csv("your_file_name.xlsx")

This is probably not the prettiest solution because there is also pandas.read_xml() so you probably could make it shorter and avoid having to use etree by directly reading the XML into a DataFrame and then directly write that to CSV.

  • Related