I'm new to Python and have trouble importing photos from an xml file. Unfortunately I understand that the xml file is crooked and poorly created. But unfortunately in this format it will come to the server and it is not possible to change it.
XML structure:
<?xml version="1.0" encoding="utf-8" ?>
<test>
<test-item>
<sku>098730</sku>
<name><![CDATA[Bala bla bla]]></name>
<description><![CDATA[Bala bla bla. Bala bla bla. Bala bla bla.]]>
</description>
<image><![CDATA[image url]]></image>
<image2><![CDATA[image url]]></image2>
<image3><![CDATA[image url]]></image3>
<image4><![CDATA[image url]]></image4>
</test-item>
</test>
How can I properly import images (<image>
, <image2>
,... etc) from this file which have a bad structure?
CodePudding user response:
Try the below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="utf-8" ?>
<test>
<test-item>
<sku>098730</sku>
<name><![CDATA[Bala bla bla]]></name>
<description><![CDATA[Bala bla bla. Bala bla bla. Bala bla bla.]]>
</description>
<image><![CDATA[image url1]]></image>
<image2><![CDATA[image url2]]></image2>
<image3><![CDATA[image url43]]></image3>
<image4><![CDATA[image url4]]></image4>
</test-item>
</test>'''
root = ET.fromstring(xml)
images = []
counter = 0
while True:
if counter == 0:
img = root.find('.//image')
if img is None:
break
images.append(img.text)
counter = 2
else:
img = root.find('.//image{}'.format(counter))
if img is None:
break
images.append(img.text)
counter = 1
for idx,image in enumerate(images,1):
print('{}) {}'.format(idx,image))
output
1) image url1
2) image url2
3) image url43
4) image url4