I am new to parsing XML data and I am trying to parse this nested tags data but facing some issues. Help would be appreciated.
I have this data:
<Items>
<Item MaintenanceType="A">
<ItemLevelGTIN GTINQualifier="UP">0006582</ItemLevelGTIN>
<PartNumber>VRX42</PartNumber>
<BrandAAIAID>JHHK</BrandAAIAID>
<PartTerminologyID>1896</PartTerminologyID>
<Descriptions>
<Description MaintenanceType="A" DescriptionCode="MKT" LanguageCode="EN">Bentley Brake Rotors designs,
</Description>
</Descriptions>
<ExtendedInformation>
<ExtendedProductInformation MaintenanceType="A" LanguageCode="EN" EXPICode="CTO">Germany</ExtendedProductInformation>
</ExtendedInformation>
<ProductAttributes/>
<Packages>
<Package MaintenanceType="A">
<PackageUOM>EA</PackageUOM>
<QuantityofEaches>1</QuantityofEaches>
<Dimensions UOM="IN">
<Length>17.1300</Length>
<Width>15.7400</Width>
<Height>3.5400</Height>
</Dimensions>
<Weights UOM="lb">
<Weight>25.1000</Weight>
</Weights>
</Package>
</Packages>
<DigitalAssets>
<DigitalFileInformation MaintenanceType="A" LanguageCode="EN">
<FileName>VRX47K.PNG</FileName>
<AssetType>P04</AssetType>
<FileSize>500061</FileSize>
<AssetDimensions UOM="PX">
<AssetHeight>499</AssetHeight>
<AssetWidth>512</AssetWidth>
</AssetDimensions>
<FileDateModified>2021-09-02</FileDateModified>
<URI>https://www.asapnetwork.org</URI>
</DigitalFileInformation>
</DigitalAssets>
</Item>
</items>
I want to fetch text information from each tag. I have tried the following code but it is throwing this error. Can anyone help?
My code is given below:
import xml.etree.ElementTree as ETree
import pandas as pd
xmldata = "0H-2021-09-15-Pies.xml"
prstree = ETree.parse(xmldata)
root = prstree.getroot()
store_items = []
all_items = []
cols = ["ItemLevelGTIN", "PartNumber", "BrandAAIAID", "PartTerminologyID", "Description","ExtendedProductInformation", \
"PackageUOM", "QuantityofEaches", "Length", "Width", "Height", "Weight", "FileName", "AssetType", "FileSize", \
"AssetHeight", "AssetWidth", "FileDateModified", "URI"]
for child in root.iter('Items'):
children = child.findall('Item')
for elem in children:
ItemLevelGTIN = elem.find("ItemLevelGTIN").text
PartNumber = elem.find("PartNumber").text
BrandAAIAID = elem.find("BrandAAIAID").text
PartTerminologyID = elem.find("PartTerminologyID").text
Description = elem.find("Description").text
ExtendedProductInformation = elem.find("ExtendedProductInformation").text
PackageUOM = elem.find("PackageUOM").text
QuantityofEaches = elem.find("QuantityofEaches").text
Length = elem.find("Length").text
Width = elem.find("Width").text
Height = elem.find("Height").text
FileName = elem.find("FileName").text
AssetType = elem.find("AssetType").text
FileSize = elem.find("FileSize").text
AssetHeight = elem.find("AssetHeight").text
AssetWidth = elem.find("AssetWidth").text
FileDateModified = elem.find("FileDateModified").text
URI = elem.find("URI").text
store_items = [ItemLevelGTIN, PartNumber, BrandAAIAID, PartTerminologyID,ExtendedProductInformation,Description,\
ExtendedProductInformation,PackageUOM,QuantityofEaches, Length, Width, Height, FileName, AssetType,\
FileSize, AssetHeight, AssetWidth, FileDateModified,URI ]
all_items.append(store_items)
xmlToDf = pd.DataFrame(all_items, columns=cols)
print(xmlToDf.to_string(index=True))
The error is given below:
AttributeError Traceback (most recent call last)
<ipython-input-3-53b0d91d646f> in <module>
22 BrandAAIAID = elem.find("BrandAAIAID").text
23 PartTerminologyID = elem.find("PartTerminologyID").text
---> 24 Description = elem.find("Description").text
25 ExtendedProductInformation = elem.find("ExtendedProductInformation").text
26 PackageUOM = elem.find("PackageUOM").text
AttributeError: 'NoneType' object has no attribute 'text'
CodePudding user response:
Notice you are trying to get a child tag of a child (nested xml)
try to use -
Description = elem.find("Descriptions")[0].text
First the parent (Descriptions), and after that take his child (Description).
Notice this issue happens in your code in few places, so you need to fix other tags too.
Edit:
You can try this:
Description = elem.find("Descriptions").find("Description").text