I am trying to read a CSV file in Pandas, convert each row in to a JSON object and append them to a dict and then store in MongoDB.
Here is my code
data = pd.DataFrame(pd.read_csv('data/airports_test.csv'))
for i in data.index:
json = data.apply(lambda x: x.to_json(), axis=1)
json_dict = json.to_dict()
print(json_dict[5])
ins = collection.insert_many(json_dict)
# for i in json_dict:
# ins = collection.insert_one(json_dict[i])
If I print elements of the dict I get the correct output (I think..). If I try to use collection.insert_many, I get the error 'documents must be a non empty list'
If I try to loop through the dict and add one at a time I get the error
document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
I have Googled and Googled but I can't seem to find a solution! Any help would be massively appreciated.
CodePudding user response:
You can skip processing the individual rows of the DataFrame via:
import json
import pandas
data = pandas.DataFrame(pandas.read_csv('test2.csv'))
data = json.dumps(data.to_dict(orient="records"), indent=4)
print(data)
alternatively, as @JNevill points out, you can simplify this still via:
import pandas
data = pandas.DataFrame(pandas.read_csv('test2.csv'))
data = data.to_json(orient="records")
print(data)
With either strategy, at this point you should be able to use:
collection.insert_many(data)
As an aside, I think I would personally use the csv
module and dictReader
rather than pandas
here but this way is fine.