Home > Software engineering >  Is there a way to extract data based on keyword from XML?
Is there a way to extract data based on keyword from XML?

Time:11-08

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<Catalog>
   <CatalogItemLine>
     <NameValue name="Car_id">111-2020-3</NameValue>
     <NameValue name="Car Name">Honda Accord</NameValue>
     <NameValue name="Price at location 98134">40000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">30</NameValue>
     <NameValue name="color">blue</NameValue>
     <NameValue name="door">4</NameValue>
     
   </CatalogItemLine>
   <CatalogItemLine>
     <NameValue name="Car_id">121-2020-3</NameValue>
     <NameValue name="Car Name">Honda Civic</NameValue>
     <NameValue name="Price at location 98134">30000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">35</NameValue>
     <NameValue name="color">white</NameValue>
     <NameValue name="door">2</NameValue>
   </CatalogItemLine>
   <CatalogItemLine>
     <NameValue name="Car_id">131-2020-3</NameValue>
     <NameValue name="Car Name">Toyota Camry</NameValue>
     <NameValue name="Price at location 98134">45000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">32</NameValue>
     <NameValue name="color">black</NameValue>
     <NameValue name="door">4</NameValue>
   </CatalogItemLine>
   <CatalogItemLine>
     <NameValue name="Car_id">151-2020-3</NameValue>
     <NameValue name="Car Name">Honda Pilot</NameValue>
     <NameValue name="Price at location 98134">50000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">30</NameValue>
     <NameValue name="color">gray</NameValue>
     <NameValue name="door">4</NameValue>
   </CatalogItemLine>
   <CatalogItemLine>
     <NameValue name="Car_id">101-2020-3</NameValue>
     <NameValue name="Car Name">Chevy Malibu</NameValue>
     <NameValue name="Price at location 98134">40000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">30</NameValue>
     <NameValue name="color">white</NameValue>
     <NameValue name="door">4</NameValue>
   </CatalogItemLine>
</Catalog>  

I am trying to fetch the data which contains keyword "Honda" in xml file. Only trying to fetch (Car_id, Car Name, Price at location 98134, Mpg). Output I want to get is:

**Car_id
111-2020-3
Car Name
Honda Accord
Price at location 98134
40000
Mpg
30

Car_id
121-2020-3
Car Name
Honda Civic
Price at location 98134
30000
Mpg
35

Car_id
151-2020-3
Car Name
Honda Pilot
Price at location 98134
50000
Mpg
30**

Code:

import xml.etree.ElementTree as ET

xmlfile= ('cardata.xml')


tree = ET.parse(xmlfile)
root = tree.getroot()

    for CatalogItemLine in root.findall('.//CatalogItemLine'):
        if CatalogItemLine.find('NameValue') is not None:
           NameValue = CatalogItemLine.find('NameValue')
           if NameValue.text is not None:
               if "Honda" in NameValue.text:
                   print(CatalogItemLine.find('NameValue').text)

I was unable to get output, I just started learning and using python and XML.

I am trying to fetch the data which contains keyword "Honda" in xml file.

Only trying to fetch the following data (Car_id, Car Name, Price at location 98134, Mpg).

Highly appreciate your HELP.

CodePudding user response:

Try the below

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<Catalog>
   <CatalogItemLine>
     <NameValue name="Car_id">111-2020-3</NameValue>
     <NameValue name="Car Name">Honda Accord</NameValue>
     <NameValue name="Price at location 98134">40000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">30</NameValue>
     <NameValue name="color">blue</NameValue>
     <NameValue name="door">4</NameValue>
     
   </CatalogItemLine>
   <CatalogItemLine>
     <NameValue name="Car_id">121-2020-3</NameValue>
     <NameValue name="Car Name">Honda Civic</NameValue>
     <NameValue name="Price at location 98134">30000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">35</NameValue>
     <NameValue name="color">white</NameValue>
     <NameValue name="door">2</NameValue>
   </CatalogItemLine>
   <CatalogItemLine>
     <NameValue name="Car_id">131-2020-3</NameValue>
     <NameValue name="Car Name">Toyota Camry</NameValue>
     <NameValue name="Price at location 98134">45000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">32</NameValue>
     <NameValue name="color">black</NameValue>
     <NameValue name="door">4</NameValue>
   </CatalogItemLine>
   <CatalogItemLine>
     <NameValue name="Car_id">151-2020-3</NameValue>
     <NameValue name="Car Name">Honda Pilot</NameValue>
     <NameValue name="Price at location 98134">50000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">30</NameValue>
     <NameValue name="color">gray</NameValue>
     <NameValue name="door">4</NameValue>
   </CatalogItemLine>
   <CatalogItemLine>
     <NameValue name="Car_id">101-2020-3</NameValue>
     <NameValue name="Car Name">Chevy Malibu</NameValue>
     <NameValue name="Price at location 98134">40000</NameValue>
     <NameValue name="type">Gas</NameValue>
     <NameValue name="Mpg">30</NameValue>
     <NameValue name="color">white</NameValue>
     <NameValue name="door">4</NameValue>
   </CatalogItemLine>
</Catalog>  '''
properties = ['Car_id', 'Car Name', 'Price at location 98134', 'Mpg']
root = ET.fromstring(xml)
data = []
for cil in root.findall('.//CatalogItemLine'):
    nv = cil.find('NameValue[@name="Car Name"]')
    if 'honda' in nv.text.lower():
        entry = {}
        for p in properties:
            entry[p] = cil.find(f'NameValue[@name="{p}"]').text
        data.append(entry)
for entry in data:
    print(entry)

output

{'Car_id': '111-2020-3', 'Car Name': 'Honda Accord', 'Price at location 98134': '40000', 'Mpg': '30'}
{'Car_id': '121-2020-3', 'Car Name': 'Honda Civic', 'Price at location 98134': '30000', 'Mpg': '35'}
{'Car_id': '151-2020-3', 'Car Name': 'Honda Pilot', 'Price at location 98134': '50000', 'Mpg': '30'}
  • Related