Home > Software engineering >  Reading from CSV, converting to JSON and storing in MongoDB
Reading from CSV, converting to JSON and storing in MongoDB

Time:03-21

I am trying to read a CSV file in Pandas, convert each row in to a JSON object and append them to a dict and then store in MongoDB.

Here is my code

data = pd.DataFrame(pd.read_csv('data/airports_test.csv'))

for i in data.index:
    json = data.apply(lambda x: x.to_json(), axis=1)
    json_dict = json.to_dict()

print(json_dict[5])

ins = collection.insert_many(json_dict)
# for i in json_dict:
#     ins = collection.insert_one(json_dict[i])

If I print elements of the dict I get the correct output (I think..). If I try to use collection.insert_many, I get the error 'documents must be a non empty list' If I try to loop through the dict and add one at a time I get the error

document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping

I have Googled and Googled but I can't seem to find a solution! Any help would be massively appreciated.

CodePudding user response:

You can skip processing the individual rows of the DataFrame via:

import json
import pandas
data = pandas.DataFrame(pandas.read_csv('test2.csv'))
data = json.dumps(data.to_dict(orient="records"), indent=4)
print(data)

alternatively, as @JNevill points out, you can simplify this still via:

import pandas
data = pandas.DataFrame(pandas.read_csv('test2.csv'))
data = data.to_json(orient="records")
print(data)

With either strategy, at this point you should be able to use:

collection.insert_many(data)

As an aside, I think I would personally use the csv module and dictReader rather than pandas here but this way is fine.

  • Related