Home > front end >  MongoDB aggregation exceeds time limit and never ends
MongoDB aggregation exceeds time limit and never ends

Time:03-09

I have a collection users with documents like this:

{
  _id: "3889",
  code: "3889",
  name: "bla bla"
}

(Note that _id and code always have same values.)

Some days ago, schema has changed and now the _id is not a string anymore, but an object.

{
  _id: {
    code: "4003"
  },
  code: "4003",
  name: "ble ble"
}

(Note that code is still duplicated and it has to be like that.)

Currently, there exist both formats in the collection and I'm trying to "delete safely" documents that matches old schema (those where _id is a string) and there exists an equivalent document (having the same code) matching the new schema.

In pseudo-code would be something like this:

usersWithOldSchema = SELECT * FROM USERS WHERE _id = code
for user of usersWithOldSchema {
  userWithNewSchema = SELECT * FROM USERS WHERE _id.code = user.code
  if userWithNewSchema != null {
    deleteFromDatabase(user)
  }
}

I have tried with this aggregation to select users to delete:

[{
    // 1. Select users with old schema
    $match: {
        '_id.code': {
            $exists: false
        }
    }
}, {
    // 2. "Join" with the collection itself and save in "userWithNewSchema" the equivalent user
    $lookup: {
        from: 'users',
        localField: 'code',
        foreignField: '_id.code',
        as: 'userWithNewSchema'
    }
}, {   
    // 3. If array is empty, it means no equivalent user with new schema was found, so this document should be deleted
    $match: {
        'userWithNewSchema': {
            $size: 0
        }
    }
}]

It doesn't work. Operation exceeds time limit and never ends, no matter how much timeout is configured. I suspect it's not a timeout problem. How can I solve this?

CodePudding user response:

Maybe something simple like that can do the job:

    db.users.find({
                  '_id.code': {
                              $exists: false
                   }
                  }).forEach(function(s){
                   var x=db.users.count({ "_id.code":s._id });
                   if(x==1){ db.users.remove({_id:s._id});print("removed: " s._id)  }
                })
            })

And afcourse good to have index on "_id.code"

  • Related