I have about 100,000 price-table records in XML and I need to remove entries where the price amount is 0.00. The data is structured as follows:
<data>
<price-table product-id="100109a">
<amount quantity="1">10.00</amount>
</price-table>
<price-table product-id="201208c">
<amount quantity="1">0.00</amount>
</price-table>
</data>
I'm trying to use Python to do the work and I have the following:
from xml.etree import ElementTree as ET
def readfile():
with open('prices.xml') as f:
contents = f.read()
return(contents)
xml_string = readfile()
root = ET.fromstring(xml_string)
for price_table in root.findall('price-table'):
amount = price_table.find('amount')
if float(amount.text) != 0:
root.remove(price_table)
xmltowrite = ET.tostring(root)
#print(xmltowrite)
with open('xmlwrite.txt', 'w') as j:
j.write(xmltowrite)
When I run this, the error I get is:
TypeError: write() argument must be str, not bytes
But my understanding is that the ET.tostring() function should be converting the xmltowrite value to a string... Why is that not a string at the end?
CodePudding user response:
If you print the type(xmltowrite)
you will see it's a <class 'bytes'>
. You can decode it with ET.tostring(root).decode("Utf-8")
, than you get <class 'str'>
.
CodePudding user response:
tostring()
returns a bytes
object unless encoding="unicode"
is used.
The code can be simplified quite a bit. There is no need to use open()
, fromstring()
or tostring()
. Just parse the XML file into an ElementTree
object, do your changes, and save using ElementTree.write()
.
from xml.etree import ElementTree as ET
tree = ET.parse("prices.xml")
root = tree.getroot()
for price_table in root.findall('price-table'):
amount = price_table.find('amount')
if float(amount.text) != 0:
root.remove(price_table)
tree.write('xmlwrite.txt')