Home > Blockchain >  How import all images from xml in python with a bad structure
How import all images from xml in python with a bad structure

Time:06-02

I'm new to Python and have trouble importing photos from an xml file. Unfortunately I understand that the xml file is crooked and poorly created. But unfortunately in this format it will come to the server and it is not possible to change it.

XML structure:

<?xml version="1.0" encoding="utf-8" ?>
<test>
    <test-item>
            <sku>098730</sku>
            <name><![CDATA[Bala bla bla]]></name>
            <description><![CDATA[Bala bla bla. Bala bla bla. Bala bla bla.]]>
            </description>
            <image><![CDATA[image url]]></image>
            <image2><![CDATA[image url]]></image2>
            <image3><![CDATA[image url]]></image3>
            <image4><![CDATA[image url]]></image4>
    </test-item>
</test>

How can I properly import images (<image>, <image2>,... etc) from this file which have a bad structure?

CodePudding user response:

Try the below

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="utf-8" ?>
<test>
    <test-item>
            <sku>098730</sku>
            <name><![CDATA[Bala bla bla]]></name>
            <description><![CDATA[Bala bla bla. Bala bla bla. Bala bla bla.]]>
            </description>
            <image><![CDATA[image url1]]></image>
            <image2><![CDATA[image url2]]></image2>
            <image3><![CDATA[image url43]]></image3>
            <image4><![CDATA[image url4]]></image4>
    </test-item>
</test>'''

root = ET.fromstring(xml)
images = []
counter = 0
while True:
  if counter == 0:
    img = root.find('.//image')
    if img is None:
      break
    images.append(img.text)
    counter =  2
  else:
      img = root.find('.//image{}'.format(counter))
      if img is None:
        break
      images.append(img.text)
      counter  = 1
for idx,image in enumerate(images,1):
  print('{}) {}'.format(idx,image))

output

1) image url1
2) image url2
3) image url43
4) image url4
  • Related