Home > OS >  Find Nested Tags in a XML file and Convert it in Data frame Python
Find Nested Tags in a XML file and Convert it in Data frame Python

Time:09-25

I am new to parsing XML data and I am trying to parse this nested tags data but facing some issues. Help would be appreciated.

I have this data:

<Items>
    <Item MaintenanceType="A">
        <ItemLevelGTIN GTINQualifier="UP">0006582</ItemLevelGTIN>
        <PartNumber>VRX42</PartNumber>
        <BrandAAIAID>JHHK</BrandAAIAID>
        <PartTerminologyID>1896</PartTerminologyID>

        <Descriptions>
            <Description MaintenanceType="A" DescriptionCode="MKT" LanguageCode="EN">Bentley Brake Rotors designs,
            </Description>
        </Descriptions>

        <ExtendedInformation>
            <ExtendedProductInformation MaintenanceType="A" LanguageCode="EN" EXPICode="CTO">Germany</ExtendedProductInformation>
        </ExtendedInformation>
        
        <ProductAttributes/>
        
        <Packages>
            <Package MaintenanceType="A">
            <PackageUOM>EA</PackageUOM>
            <QuantityofEaches>1</QuantityofEaches>

            <Dimensions UOM="IN">
                <Length>17.1300</Length>
                <Width>15.7400</Width>
                <Height>3.5400</Height>
            </Dimensions>
        
        
            <Weights UOM="lb">
                <Weight>25.1000</Weight>
            </Weights>
        
            </Package>
        </Packages>
        
        <DigitalAssets>
            <DigitalFileInformation MaintenanceType="A" LanguageCode="EN">
            <FileName>VRX47K.PNG</FileName>
            <AssetType>P04</AssetType>
            <FileSize>500061</FileSize>
            <AssetDimensions UOM="PX">
                <AssetHeight>499</AssetHeight>
                <AssetWidth>512</AssetWidth>
            </AssetDimensions>
        
            <FileDateModified>2021-09-02</FileDateModified>
        
            <URI>https://www.asapnetwork.org</URI>
        
            </DigitalFileInformation>
        
        </DigitalAssets>
    
    </Item>
</items>

I want to fetch text information from each tag. I have tried the following code but it is throwing this error. Can anyone help?

My code is given below:

import xml.etree.ElementTree as ETree
import pandas as pd

xmldata = "0H-2021-09-15-Pies.xml"

prstree = ETree.parse(xmldata)
root = prstree.getroot()

store_items = []
all_items = []

cols = ["ItemLevelGTIN", "PartNumber", "BrandAAIAID", "PartTerminologyID", "Description","ExtendedProductInformation", \
        "PackageUOM", "QuantityofEaches", "Length", "Width", "Height", "Weight", "FileName", "AssetType", "FileSize", \
        "AssetHeight", "AssetWidth", "FileDateModified", "URI"]


for child in root.iter('Items'):
    children = child.findall('Item')
    for elem in children:
        ItemLevelGTIN = elem.find("ItemLevelGTIN").text
        PartNumber = elem.find("PartNumber").text
        BrandAAIAID = elem.find("BrandAAIAID").text
        PartTerminologyID = elem.find("PartTerminologyID").text
        Description = elem.find("Description").text
        ExtendedProductInformation = elem.find("ExtendedProductInformation").text
        PackageUOM = elem.find("PackageUOM").text
        QuantityofEaches = elem.find("QuantityofEaches").text
        Length = elem.find("Length").text
        Width = elem.find("Width").text
        Height = elem.find("Height").text
        FileName = elem.find("FileName").text
        AssetType = elem.find("AssetType").text
        FileSize = elem.find("FileSize").text
        AssetHeight = elem.find("AssetHeight").text
        AssetWidth = elem.find("AssetWidth").text
        FileDateModified = elem.find("FileDateModified").text
        URI = elem.find("URI").text
        
        
        store_items = [ItemLevelGTIN, PartNumber, BrandAAIAID, PartTerminologyID,ExtendedProductInformation,Description,\
                      ExtendedProductInformation,PackageUOM,QuantityofEaches, Length, Width, Height, FileName, AssetType,\
                      FileSize, AssetHeight, AssetWidth, FileDateModified,URI ]

        all_items.append(store_items)

xmlToDf = pd.DataFrame(all_items, columns=cols)
print(xmlToDf.to_string(index=True)) 

The error is given below:

AttributeError                            Traceback (most recent call last)
<ipython-input-3-53b0d91d646f> in <module>
     22         BrandAAIAID = elem.find("BrandAAIAID").text
     23         PartTerminologyID = elem.find("PartTerminologyID").text
---> 24         Description = elem.find("Description").text
     25         ExtendedProductInformation = elem.find("ExtendedProductInformation").text
     26         PackageUOM = elem.find("PackageUOM").text

AttributeError: 'NoneType' object has no attribute 'text'

CodePudding user response:

Notice you are trying to get a child tag of a child (nested xml)

try to use -

Description = elem.find("Descriptions")[0].text

First the parent (Descriptions), and after that take his child (Description).

Notice this issue happens in your code in few places, so you need to fix other tags too.

Edit:

You can try this:

Description = elem.find("Descriptions").find("Description").text
  • Related