How to update / insert multiple documents based on condition in one query MongoDB-CodePudding

I have the following structure for 1 document :

{
    "uid": uid,
    "at": at,
    "url": url,
    "members": members
}

I would like to bulk create / or update the documents if they already exist based on their uid as i am passing many object like the above in one request.

I want something that looks like (in python) :

ids = []
for item in items:
  ids.append(item["uid"])
db.collection.update_many({"_id": {"$id": ids}}, {"$set": items}, upsert=True)

But will it update the correct documents according to the correct ids and insert the ones where uid doesn't exist ?

I would like to perform the insert OR update if the object exists or not it will either insert it or replace it based on the uid.

Thanks in advance

CodePudding user response：

There is no direct way to upsert a list of keys with a list of contents, e.g. (this is pseudocode, not valid MQL):

keys = ['A','B'];
content = [{key:'A',name:'X',pos:1},{key:'B',name:'Y',pos:2}];
db.collection.updateMany(keys, {$set: content});

The updateMany function is designed to filter out a relatively small number of docs and apply a (mostly) constant update pattern, which is NOT the desired operation here.

It is possible to easily do this in a two step way:

Load your updates and potential inserts into a temp collection.
Use the $merge operator to overlay the material onto the target collection.

Consider this existing target collection foo:

db.foo.insert([
    {uid: 0, at: "A", members: [1,2]},
    {uid: 1, at: "B", members: [1,2]},
    {uid: 2, at: "C", members: [1,2]},
    {uid: 3, at: "D", members: [1,2]}
]);

Assume we construct an array of items similar to the what the OP posted:

var items = [
    {uid:1, at:"XX"},   // change
    {uid:3, at:"XX", members:[3,4]},  // change
    {uid:7, at:"Q", members:[3,4]},   // new
    {uid:8, at:"P", members:[3,4]}    // new
];

The following pipeline will merge these onto the target collection:


db.TEMP1.drop(); // OK if does not exist

// Blast items into the TEMP1 collection.  You could use the bulk
// insert functions https://docs.mongodb.com/manual/reference/method/Bulk.insert/ if you prefer,
// or in theory even use mongoimport outside of this script to bulk
// load material into TEMP1.
db.TEMP1.insert(items); 

c=db.TEMP1.aggregate([
    {$project: {_id:false}}, // must exclude _id to prevent conflict with target 'foo'
    {$merge: {
        into: "foo",  // ah HA!  
        on: [ "uid" ],
        whenMatched: "replace",
        whenNotMatched: "insert"
    }}
]);

db.foo.find();
{ "_id" : 0, "uid" : 0, "at" : "A", "members" : [ 1, 2 ] }  // unchanged
{ "_id" : 1, "uid" : 1, "at" : "XX" }     // modified and members gone
{ "_id" : 2, "uid" : 2, "at" : "C", "members" : [ 1, 2 ] }  // unchanged
{ "_id" : 3, "uid" : 3, "at" : "XX", "members" : [ 3, 4 ] }  // modified
{ "_id" : ObjectId("6228be517dec6ed6fa40cbe3"), "uid" : 7, "at" : "Q", "members" : [ 3, 4 ] }   // inserted
{ "_id" : ObjectId("6228be517dec6ed6fa40cbe4"), "uid" : 8, "at" : "P", "members" : [ 3, 4 ] }   // inserted

The caveat is a unique index must exist on uid in the target collection but that is probably what you would want to have anyway for such an operation to be reasonably performant.

This approach is more client-side data oriented, i.e. it assumes the client side is building up big, complete structures of data to either overlay or insert, as opposed to update which is more "surgical" in its ability to selectively touch as few fields as possible.

For quantitative comparison, 2 million docs sized roughly as above were loaded into foo. A script that uses bulk operations loaded 50,000 items into TEMP1 of which half had matching uid (for update) and half did not (for insert), followed by the running of the $merge script. On a MacBookPro w/16GB RAM and SSD disk and a v4.4.2 server, the times are

1454 millis to insert 50013; 34396 ins/sec
7144 millis to merge; 7000 upd/sec
8598 total millis; 5816 ops/sec

If done in a simple loop:

33452 millis for loop update; 1495 upd/sec

So the temp collection and $merge approach is nearly 4X faster.