I am currently trying to convert an xml document with approx 2k records to json to upload to Mongo DB. I have written a python script for the conversion but when I upload it to Mongo db the collection is reading this as one document with 2k sub arrays (objects) but I am trying to get 2k documents instead. My thoughts are it could be the python code? Can anyone help.
# Program to convert an xml
# file to json file
# import json module and xmltodict
# module provided by python
import json
import xmltodict
# open the input xml file and read
# data in form of python dictionary
# using xmltodict module
with open("test.xml") as xml_file:
data_dict = xmltodict.parse(xml_file.read())
# xml_file.close()
# generate the object using json.dumps()
# corresponding to json data
json_data = json.dumps(data_dict)
# Write the json data to output
# json file
with open("data.json", "w") as json_file:
json_file.write(json_data)
# json_file.close()
CodePudding user response:
I am not sure why you would expect an XML-to-JSON converter to automatically split the XML at "record" boundaries. After all, XML doesn't have a built-in concept of "records" - that's something in the semantics of your vocabulary, not in the syntax of XML.
The easiest way to split an XML file into multiple files is with a simple XSLT 2.0 stylesheet. If you use XSLT 3.0 then you can invoke the JSON conversion at the same time.
CodePudding user response:
Here is my solution.
import xmltodict
import json
import pprint
# Open xml file
with open(r"test.xml", "rb") as xml_file:
# data_dict = xmltodict.parse(xml_file.read())
dict_data = xmltodict.parse(xml_file)
output_data = dict_data["root"]["course_listing"]
json_data = json.dumps(output_data, indent=2)
print(json_data)
with open("datanew.json", "w") as json_file:
json_file.write(json_data)