Im having some issues when a XML file is parsed to JSON,so this is the XML file looks like:
<return>
<ciudad>BARRANQUILLA</ciudad>
<codProducto>5</codProducto>
<enviado>0</enviado>
<fechaCaptura>2020-03-18T00:00:00-05:00</fechaCaptura>
<fechaCreacion>2020-03-18T14:00:01-05:00</fechaCreacion>
<precioPromedio>811</precioPromedio>
<producto>Chócolo mazorca</producto>
<regId>316992</regId>
</return>
<return>
<ciudad>BARRANQUILLA</ciudad>
<codProducto>8</codProducto>
<enviado>0</enviado>
<fechaCaptura>2020-03-18T00:00:00-05:00</fechaCaptura>
<fechaCreacion>2020-03-18T14:00:01-05:00</fechaCreacion>
<precioPromedio>2063</precioPromedio>
<producto>Pimentón</producto>
<regId>316995</regId>
</return>
This is the code that Im using to parse the file,using xmltodict
library:
with open('result.xml', 'r', encoding='iso-8859-1') as xmlarch:
with open('result.json', 'w', encoding='iso-8859-1') as json_f:
obj = xmltodict.parse(xmlarch.read())
json.dump(obj, json_f, indent=4)
But some characters are encoded in the JSON file when the file is parsed
[
{
"ciudad": "BARRANQUILLA",
"codProducto": "5",
"enviado": "0",
"fechaCaptura": "2020-03-18T00:00:00-05:00",
"fechaCreacion": "2020-03-18T14:00:01-05:00",
"precioPromedio": "811",
"producto": "Chócolo mazorca",
"regId": "316992"
},
{
"ciudad": "BARRANQUILLA",
"codProducto": "8",
"enviado": "0",
"fechaCaptura": "2020-03-18T00:00:00-05:00",
"fechaCreacion": "2020-03-18T14:00:01-05:00",
"precioPromedio": "2063",
"producto": "Pimentón",
"regId": "316995"
}
]
I never worked a file that contains those spanish characters, maybe the problem is the encoding part I think, already tried some other encodings like uft-8 but did not work, also any feedback apreciated!
CodePudding user response:
You need to add ensure_ascii=False
to the json parser
From the json documentation
this module’s serializer sets ensure_ascii=True by default, thus escaping the output so that the resulting strings only contain ASCII characters.
So the result:
with open('result.xml', 'r', encoding='iso-8859-1') as xmlarch:
with open('result.json', 'w', encoding='iso-8859-1') as json_f:
obj = xmltodict.parse(xmlarch.read())
json.dump(obj, json_f, indent=4, ensure_ascii=False)