Home > Software design >  Converting xml to json for Mongo db
Converting xml to json for Mongo db

Time:08-19

I am currently trying to convert an xml document with approx 2k records to json to upload to Mongo DB. I have written a python script for the conversion but when I upload it to Mongo db the collection is reading this as one document with 2k sub arrays (objects) but I am trying to get 2k documents instead. My thoughts are it could be the python code? Can anyone help.

# Program to convert an xml
# file to json file

# import json module and xmltodict
# module provided by python
import json
import xmltodict


# open the input xml file and read
# data in form of python dictionary
# using xmltodict module
with open("test.xml") as xml_file:
    
    data_dict = xmltodict.parse(xml_file.read())
    # xml_file.close()
    
    # generate the object using json.dumps()
    # corresponding to json data
    
    json_data = json.dumps(data_dict)
    
    # Write the json data to output
    # json file
    with open("data.json", "w") as json_file:
        json_file.write(json_data)
        # json_file.close()

CodePudding user response:

I am not sure why you would expect an XML-to-JSON converter to automatically split the XML at "record" boundaries. After all, XML doesn't have a built-in concept of "records" - that's something in the semantics of your vocabulary, not in the syntax of XML.

The easiest way to split an XML file into multiple files is with a simple XSLT 2.0 stylesheet. If you use XSLT 3.0 then you can invoke the JSON conversion at the same time.

CodePudding user response:

Here is my solution.

import xmltodict
import json
import pprint
# Open xml file
with open(r"test.xml", "rb") as xml_file:
    # data_dict = xmltodict.parse(xml_file.read())
    dict_data = xmltodict.parse(xml_file)

output_data = dict_data["root"]["course_listing"]

json_data = json.dumps(output_data, indent=2)
print(json_data)

with open("datanew.json", "w") as json_file:
    json_file.write(json_data)
  • Related