Converting xml to json for Mongo db-CodePudding

I am currently trying to convert an xml document with approx 2k records to json to upload to Mongo DB. I have written a python script for the conversion but when I upload it to Mongo db the collection is reading this as one document with 2k sub arrays (objects) but I am trying to get 2k documents instead. My thoughts are it could be the python code? Can anyone help.

# Program to convert an xml
# file to json file

# import json module and xmltodict
# module provided by python
import json
import xmltodict


# open the input xml file and read
# data in form of python dictionary
# using xmltodict module
with open("test.xml") as xml_file:
    
    data_dict = xmltodict.parse(xml_file.read())
    # xml_file.close()
    
    # generate the object using json.dumps()
    # corresponding to json data
    
    json_data = json.dumps(data_dict)
    
    # Write the json data to output
    # json file
    with open("data.json", "w") as json_file:
        json_file.write(json_data)
        # json_file.close()

CodePudding user response：

I am not sure why you would expect an XML-to-JSON converter to automatically split the XML at "record" boundaries. After all, XML doesn't have a built-in concept of "records" - that's something in the semantics of your vocabulary, not in the syntax of XML.

The easiest way to split an XML file into multiple files is with a simple XSLT 2.0 stylesheet. If you use XSLT 3.0 then you can invoke the JSON conversion at the same time.

CodePudding user response：

Here is my solution.

import xmltodict
import json
import pprint
# Open xml file
with open(r"test.xml", "rb") as xml_file:
    # data_dict = xmltodict.parse(xml_file.read())
    dict_data = xmltodict.parse(xml_file)

output_data = dict_data["root"]["course_listing"]

json_data = json.dumps(output_data, indent=2)
print(json_data)

with open("datanew.json", "w") as json_file:
    json_file.write(json_data)