I have a very small test base, with 5 documents and 4 arguments. I want to update these documents when importing a new file (add new fields, replace old values with new ones) during the JSON file import process.
Previously, I was able to do this process in the process of importing a CSV file.
Code for CSV file:
def update_and_add_with_csv(self, data, key):
""" The function update all documents in collection databases using csv file
(add new columns and change old value). Using pandas """
df = pd.read_csv(data, low_memory=False)
df = df.to_dict('records')
key = key
try:
startTime = time.time()
for row in df:
self.collection.update_one({key: row.get(key)}, {'$set': row}, upsert=True)
endTime = time.time()
totalTime = endTime - startTime
totalTime = str('{:>.3f}'.format(totalTime))
How can this be done with JSON?
JSON file like this:
CodePudding user response:
I think the best way to do this is to not update these documents but replace them.
I'm assuming your date fields can be used as unique identifiers.
def update_and_add_with_json(self, file_path):
""" The function update all documents in collection databases using JSON file """
file_data = json.load(open(file_path, "r"))
start_time = time.time()
for record in file_data:
replace = self.collection.find_one_and_replace({"date": record["date"]}, record)
end_time = time.time()
total_time = end_time - start_time
total_time = str('{:>.3f}'.format(total_time))
return total_time
Not sure how your json file is formatted but if it is formatted the same way as your schema this should work and make it easier to add fields dynamically and take advantage of MongoDBs structure less feature.
CodePudding user response:
Yes, exactly, it works in a similar way. Might be useful to someone
def update_and_add_with_json(self, data, key):
""" The function update all documents in collection databases using JSON file """
with open(data) as file:
file_data = json.load(file)
key = key
try:
startTime = time.time()
for row in file_data:
self.collection.update_one({key: row.get(key)}, {'$set': row}, upsert=True)
endTime = time.time()
totalTime = endTime - startTime
totalTime = str('{:>.3f}'.format(totalTime))