I am new to Python and have a file.xml with the following structure:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>green cat w short hair and unlimitied zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Hair</FNAME>
<FVALUE>medium</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>green</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>14</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
I use this code:
from lxml import etree as et
import pandas as pd
xml_data = et.parse('file.xml')
products = xml_data.xpath('//HEADER')
headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(xml_data.xpath('//HEADER[1]//FNAME/text()'))
rows = []
for product in products:
row = [product.xpath(f'.//{headers[0]}/text()')[0],product.xpath(f'.//{headers[1]}/text()')[0]]
f_values = product.xpath('.//FVALUE/text()')
row.extend(f_values)
rows.append(row)
df = pd.DataFrame(rows,columns=headers)
df
# df.to_csv("File_Export_V1.csv", index=False)
to receive this output:
DESCRIPTION_SHORT DESCRIPTION_LONG Hair Colour Legs
0 green cat w short hair green cat w short hair and unlimited zoomies medium green 14
When I edit my file.xml and add a new line like this:
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>green cat w short hair and unlimited zoomies</DESCRIPTION_LONG>
<BUYER_PID type="supplier_specific">100000000</BUYER_PID>
</PRODUCT_DETAILS>
I receive these errors:
AssertionError: 6 columns passed, passed data had 5 columns
ValueError: 6 columns passed, passed data had 5 columns
Most of my code is curated so I'm not 100 what does what (very new). What part of my code do I have to tweak to account for these changes? I'd like to maintain the output structure.
Thank you! ~C
CodePudding user response:
That error is caused by the fact that the number of columns has changed (by reason of your addition of the <BUYER_PID type="supplier_specific">100000000</BUYER_PID>
child node to <PRODUCT_DETAILS>
), without adjusting the number of items in your row
variable.
So if you change that variable to read:
row = [pet.xpath(f'.//{headers[0]}/text()')[0],pet.xpath(f'.//{headers[1]}/text()')[0],pet.xpath(f'.//{headers[2]}/text()')[0]]
the error should disappear.