I'm using python code that converts my XML to a CSV file and reads specific fields like "full_name", "item_name", "price", "in_stock". Unfortunately, I have a problem with reading the EAN field. During conversion, he receives the error: "AttributeError: 'NoneType' object has no attribute 'text'". I would like to add that when I remove the EAN code, everything works without any problems. How to modify the code so that it can read the EAN as well? I would be grateful for a specific piece of code that I need to add.
Below is a piece of XML file:
<?xml version="1.0" encoding="UTF-8"?>
<catalogue date="2022-08-23 15:58" GMT= " 1">
<product>
<id>14726</id>
<manufacturer>Kieslect</manufacturer>
<item_name>Kieslect Smart Tag Lite Pack (2 x Black and 1 x White) Black White</item_name>
<sku>157003-126899-18495_HU03</sku>
<warehouse>HU03</warehouse>
<bar_code>157003-126899-18495</bar_code>
<in_stock><![CDATA[&lt;50]]></in_stock>
<exp_delivery><![CDATA[0]]></exp_delivery>
<delivery_date>0000-00-00</delivery_date>
<price>20.00</price>
<image>https://images.bluefinmobileshop.com/1637675528/large-full/kieslect-smart-tag-lite-pack-2-x-black-and-1-x-white-black-white.jpg</image>
<properties> <full_name>Kieslect Smart Tag Lite (6974377570098)</full_name>
<ean>6974377570098</ean>
</properties>
<category>accessory</category>
</product>
</catalogue>
Here is my Python code:
# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
cols = ["full_name", "item_name", "price", "in_stock", "ean"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse('in.xml')
root = xmlparse.getroot()
parameters = root.findall('.//product')
for product in parameters:
item_name = product.find("item_name").text
in_stock = product.find("in_stock").text
price = product.find("price").text
sku = product.find("sku").text
for child in product.findall('.//properties'):
full_name = child.find('full_name').text
ean = child.find('ean').text
rows.append({
"full_name": full_name,
"item_name": full_name,
"price": price,
"in_stock": in_stock,
"ean": ean
})
df = pd.DataFrame(rows, columns=cols)
# Writing dataframe to csv
df.to_csv('out.csv', index=False)
CodePudding user response:
Likely, the error you receive is due to larger XML (not sample posted) where one or more of elements (not just <EAN>
) is not an available element and hence contains no text
attribute.
For this reason consider Element.findtext
where it defaults to None
if node text does not exist. Additionally, consider built-in csv
with its DictWriter
and avoid the large pandas library.
# Importing the required libraries
import csv
import xml.etree.ElementTree as Xet
# Parsing the XML file
doc = Xet.parse('in.xml')
# Initialize CSV file for writing
with open('out.csv', 'w', newline='') as csvfile:
cols = ["full_name", "item_name", "price", "in_stock", "ean"]
writer = csv.DictWriter(csvfile, fieldnames=cols)
writer.writeheader()
# Iterate through elements and write rows to CSV
parameters = doc.findall('.//product')
for product in parameters:
item_name = product.findtext("item_name")
in_stock = product.findtext("in_stock")
price = product.findtext("price")
sku = product.findtext("sku")
full_name = product.findtext('properties/full_name')
ean = product.findtext('properties/ean')
writer.writerow({
"full_name": full_name,
"item_name": item_name,
"price": price,
"in_stock": in_stock,
"ean": ean
})
CodePudding user response:
@Parfait, thanks for your help. This code is working finally! :)
Also i have the last question without making new thread:
I have this python code:
# Importing the required libraries
import csv
import xml.etree.ElementTree as Xet
# Parsing the XML file
doc = Xet.parse('in.xml')
# Initialize CSV file for writing
with open('out.csv', 'w', newline='') as csvfile:
cols = ["Indeks", "Nazwa", "Ean", "Stan_mag", "Cena_zakupu_netto", "Link_do_zdjecia"]
writer = csv.DictWriter(csvfile, fieldnames=cols)
writer.writeheader()
# Iterate through elements and write rows to CSV
parameters = doc.findall('.//Produkt')
for Produkt in parameters:
Indeks = Produkt.findtext("Indeks")
Nazwa = Produkt.findtext("Nazwa")
Ean = Produkt.findtext("Ean")
Stan_mag = Produkt.findtext("Stan_mag")
Cena_zakupu_netto = Produkt.findtext('Cena_zakupu_netto')
Link_do_zdjecia = Produkt.findtext('Linki_do_zdjec/Link_do_zdjecia')
writer.writerow({
"Indeks": Indeks,
"Nazwa": Nazwa,
"Ean": Ean,
"Stan_mag": Stan_mag,
"Cena_zakupu_netto": Cena_zakupu_netto,
"Link_do_zdjecia": Link_do_zdjecia
})
When I use it to convert an XML file with this structure, it's working, but in the output file it only extracts the first link from everything in <Linki_do_zdjec>. How to make the output file include links to pictures 1, 2 and 3, not just to the first photo in <Link_do_zdjecia>. How to deal with cases where three tags have the same name.
<Produkt>
<Marka><![CDATA[HP]]></Marka>
<Indeks>UK707A</Indeks>
<Nazwa><![CDATA[Gwarancja HP Care Pack -rozszerzenie gwarancji do 3 lat D2D]]></Nazwa>
<Ean>0884420301066</Ean>
<Kategoria><![CDATA[Komputery i Monitory]]></Kategoria>
<Stan_mag>4</Stan_mag>
<Cena_zakupu_netto>55.00</Cena_zakupu_netto>
<Vat>23</Vat>
<Kod_PCN/>
<Szt_dlugosc>21</Szt_dlugosc>
<Szt_szerokosc>15</Szt_szerokosc>
<Szt_wysokosc>1</Szt_wysokosc>
<Szt_waga_netto>0.0800</Szt_waga_netto>
<Szt_waga_brutto>0.1000</Szt_waga_brutto>
<Linki_do_zdjec/>
<Opis><![CDATA[]]></Opis>
</Produkt>
<Produkt>
<Marka><![CDATA[HP]]></Marka>
<Indeks>AAJ451AA#HP</Indeks>
<Nazwa><![CDATA[HP ExpressCard Smart Card Reader]]></Nazwa>
<Ean>0883585441587</Ean>
<Kategoria><![CDATA[Akcesoria i peryferia]]></Kategoria>
<Stan_mag>1017</Stan_mag>
<Cena_zakupu_netto>5.00</Cena_zakupu_netto>
<Vat>23</Vat>
<Kod_PCN>85234110</Kod_PCN>
<Szt_dlugosc>15</Szt_dlugosc>
<Szt_szerokosc>22</Szt_szerokosc>
<Szt_wysokosc>2</Szt_wysokosc>
<Szt_waga_netto>0.0800</Szt_waga_netto>
<Szt_waga_brutto>0.1000</Szt_waga_brutto>
<Linki_do_zdjec>
<Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_1.jpg]]></Link_do_zdjecia>
<Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_2.jpg]]></Link_do_zdjecia>
<Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_3.jpg]]></Link_do_zdjecia>
</Linki_do_zdjec>
<Opis><![CDATA[]]></Opis>
</Produkt>