I have the following structure for 1 document :
{
"uid": uid,
"at": at,
"url": url,
"members": members
}
I would like to bulk create / or update the documents if they already exist based on their uid
as i am passing many object like the above in one request.
I want something that looks like (in python) :
ids = []
for item in items:
ids.append(item["uid"])
db.collection.update_many({"_id": {"$id": ids}}, {"$set": items}, upsert=True)
But will it update the correct documents according to the correct ids
and insert the ones where uid
doesn't exist ?
I would like to perform the insert
OR update
if the object
exists or not it will either insert it or replace it based on the uid
.
Thanks in advance
CodePudding user response:
There is no direct way to upsert
a list of keys with a list of contents, e.g. (this is pseudocode, not valid MQL):
keys = ['A','B'];
content = [{key:'A',name:'X',pos:1},{key:'B',name:'Y',pos:2}];
db.collection.updateMany(keys, {$set: content});
The updateMany
function is designed to filter out a relatively small number of docs and apply a (mostly) constant update pattern, which is NOT the desired operation here.
It is possible to easily do this in a two step way:
- Load your updates and potential inserts into a temp collection.
- Use the
$merge
operator to overlay the material onto the target collection.
Consider this existing target collection foo
:
db.foo.insert([
{uid: 0, at: "A", members: [1,2]},
{uid: 1, at: "B", members: [1,2]},
{uid: 2, at: "C", members: [1,2]},
{uid: 3, at: "D", members: [1,2]}
]);
Assume we construct an array of items similar to the what the OP posted:
var items = [
{uid:1, at:"XX"}, // change
{uid:3, at:"XX", members:[3,4]}, // change
{uid:7, at:"Q", members:[3,4]}, // new
{uid:8, at:"P", members:[3,4]} // new
];
The following pipeline will merge these onto the target collection:
db.TEMP1.drop(); // OK if does not exist
// Blast items into the TEMP1 collection. You could use the bulk
// insert functions https://docs.mongodb.com/manual/reference/method/Bulk.insert/ if you prefer,
// or in theory even use mongoimport outside of this script to bulk
// load material into TEMP1.
db.TEMP1.insert(items);
c=db.TEMP1.aggregate([
{$project: {_id:false}}, // must exclude _id to prevent conflict with target 'foo'
{$merge: {
into: "foo", // ah HA!
on: [ "uid" ],
whenMatched: "replace",
whenNotMatched: "insert"
}}
]);
db.foo.find();
{ "_id" : 0, "uid" : 0, "at" : "A", "members" : [ 1, 2 ] } // unchanged
{ "_id" : 1, "uid" : 1, "at" : "XX" } // modified and members gone
{ "_id" : 2, "uid" : 2, "at" : "C", "members" : [ 1, 2 ] } // unchanged
{ "_id" : 3, "uid" : 3, "at" : "XX", "members" : [ 3, 4 ] } // modified
{ "_id" : ObjectId("6228be517dec6ed6fa40cbe3"), "uid" : 7, "at" : "Q", "members" : [ 3, 4 ] } // inserted
{ "_id" : ObjectId("6228be517dec6ed6fa40cbe4"), "uid" : 8, "at" : "P", "members" : [ 3, 4 ] } // inserted
The caveat is a unique index must exist on uid
in the target collection but that is probably what you would want to have anyway for such an operation to be reasonably performant.
This approach is more client-side data oriented, i.e. it assumes the client side is building up big, complete structures of data to either overlay or insert, as opposed to update
which is more "surgical" in its ability to selectively touch as few fields as possible.
For quantitative comparison, 2 million docs sized roughly as above were loaded into foo
. A script that uses bulk operations loaded 50,000 items into TEMP1
of which half had matching uid
(for update) and half did not (for insert), followed by the running of the $merge
script. On a MacBookPro w/16GB RAM and SSD disk and a v4.4.2 server, the times are
1454 millis to insert 50013; 34396 ins/sec
7144 millis to merge; 7000 upd/sec
8598 total millis; 5816 ops/sec
If done in a simple loop:
33452 millis for loop update; 1495 upd/sec
So the temp collection and $merge
approach is nearly 4X faster.