I have a lot of documents to update and I want to write a timestamp initially and then an update timestamp when there are duplicates. So I found this answer and am attempting it for MongoDB 6.0 https://stackoverflow.com/a/17533368/3300927
I also store in my model the variable to use when looking for duplicates as searchable
If a query has no searchable
then I insert it without checking and add a timestamp, then take the results and add a timestamp:
data_inserted = collection.insert_many(results)
for doc_id in data_inserted.inserted_ids:
collection.update_many(
filter={'_id': doc_id},
update={'$set': {'insert_date': now, }, },
upsert=True)
No issues there:
{
"_id": {
"$oid": "321654987654"
},
"IR NUMBER": "ABC784",
"Plate": " ",
"Plate State": " ",
"Make": "TOYOTA",
"Model": "TACOMA",
"Style": " ",
"Color": "SIL / ",
"Year": "2008",
"insert_date": {
"$date": {
"$numberLong": "1660000808176"
}
}
}
If there is a searchable
I attempt to look for it. What I get in MongoDB is only the searchable
field with the timestamp:
# q_statement.searchable == 'IR NUMBER'
for document in results:
collection.update_one(
filter={q_statement.searchable: document[q_statement.searchable], },
update={'$setOnInsert': {'insert_date': now, }, '$set': {'update_date': now, }},
upsert=True)
result:
{
"_id": {
"$oid": "62f19d981aa321654987"
},
"IR NUMBER": "ABC784",
"insert_date": {
"$date": {
"$numberLong": "1660001688126"
}
}
}
EDIT
Looking at the pymongo.results.UpdateResult
by changing the for
loop contents to updates = collection.update_one( ... print(updates.raw_result)
shows ~ 10k results like:
{
"n": 1,
"upserted": ObjectId("62f27ae21aa62fbfa734f01d"),
"nModified": 0,
"ok": 1.0,
"updatedExisting": False
},
{
"n": 1,
"nModified": 0,
"ok": 1.0,
"updatedExisting": True
},
{
"n": 1,
"nModified": 0,
"ok": 1.0,
"updatedExisting": True
}
(python==3.10.3, Django==4.0.4, pymongo==4.2.0)
CodePudding user response:
To "upsert"
a full document and additional fields using python
, you can use MongoDB's "$setOnInsert"
with a python
merged dictionary.
From the python library docs, here's how you merge dictionaries. (It's similar to MongoDB's "$mergeObjects"
.)
d | other
Create a new dictionary with the merged keys and values of d and other,
which must both be dictionaries. The values of other take priority
when d and other share keys.
So, to insert the full document
, using your python code, it just needs a minor addition - merge document
with your other object.
...
update={'$setOnInsert': document | {'insert_date': now}, '$set': {'update_date': now, }}
...