Home > Mobile >  ValueError when adding additional xml tags with text
ValueError when adding additional xml tags with text

Time:03-30

I am new to Python and have a file.xml with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <PRODUCT_DETAILS>
        <DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
        <DESCRIPTION_LONG>green cat w short hair and unlimitied zoomies</DESCRIPTION_LONG>
    </PRODUCT_DETAILS>
    <PRODUCT_FEATURES>
        <FEATURE>
            <FNAME>Hair</FNAME>
            <FVALUE>medium</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Colour</FNAME>
            <FVALUE>green</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Legs</FNAME>
            <FVALUE>14</FVALUE>
        </FEATURE>
    </PRODUCT_FEATURES>
</HEADER>

I use this code:

from lxml import etree as et
import pandas as pd

xml_data = et.parse('file.xml')
products = xml_data.xpath('//HEADER')

headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(xml_data.xpath('//HEADER[1]//FNAME/text()'))

rows = []

for product in products:

    row = [product.xpath(f'.//{headers[0]}/text()')[0],product.xpath(f'.//{headers[1]}/text()')[0]]
    f_values = product.xpath('.//FVALUE/text()')
    row.extend(f_values)    
    rows.append(row)

df = pd.DataFrame(rows,columns=headers)

df
# df.to_csv("File_Export_V1.csv", index=False)

to receive this output:

    DESCRIPTION_SHORT       DESCRIPTION_LONG                                Hair    Colour  Legs
0   green cat w short hair  green cat w short hair and unlimited zoomies    medium  green   14

When I edit my file.xml and add a new line like this:

<PRODUCT_DETAILS>
    <DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
    <DESCRIPTION_LONG>green cat w short hair and unlimited zoomies</DESCRIPTION_LONG>
    <BUYER_PID type="supplier_specific">100000000</BUYER_PID>
</PRODUCT_DETAILS>

I receive these errors:

AssertionError: 6 columns passed, passed data had 5 columns
ValueError: 6 columns passed, passed data had 5 columns

Most of my code is curated so I'm not 100 what does what (very new). What part of my code do I have to tweak to account for these changes? I'd like to maintain the output structure.

Thank you! ~C

CodePudding user response:

That error is caused by the fact that the number of columns has changed (by reason of your addition of the <BUYER_PID type="supplier_specific">100000000</BUYER_PID> child node to <PRODUCT_DETAILS>), without adjusting the number of items in your row variable.

So if you change that variable to read:

row = [pet.xpath(f'.//{headers[0]}/text()')[0],pet.xpath(f'.//{headers[1]}/text()')[0],pet.xpath(f'.//{headers[2]}/text()')[0]]

the error should disappear.

  • Related