Home > OS >  Override Mongoose save method to retry on `duplicate key error`
Override Mongoose save method to retry on `duplicate key error`

Time:01-30

My Mongoose schema uses a custom _id value and the code I inherited does something like this

const sampleSchema = new mongoose.Schema({
  _id: String,
  key: String,
});

sampleSchema.statics.generateId = async function() {
  let id;
  do {
    id = randomStringGenerator.generate({length: 8, charset: 'hex', capitalization: 'uppercase'});
  } while (await this.exists({_id: id}));
  return id;
};

let SampleModel = mongoose.model('Sample', sampleSchema);

A simple usage looks like this:

let mySample = new SampleModel({_id: await SampleModel.generateId(), key: 'a' });
await mySample.save();

There are at least three problems with this:

  • Every save will require at least two trips to the database, one to test for a unique id and one to save the document.
  • For this to work, it is necessary to manually call generateId() before each save. An ideal solution would handle that for me, like Mongoose does with ids of type ObjectId.
  • Most significantly, there is a potential race condition that will result in duplicate key error. Consider two clients running this code. Both coincidentally generate the same id at the same time, both look in the database and find the id absent, both try to write the record to the database. The second will fail.

An ideal solution would, on save, generate an id, save it to the database and on duplicate key error, generate a new id and retry. Do this in a loop until the document is stored successfully. The trouble is, I don't know how to get Mongoose to let me do this.

Here's what I tried: Based on this SO Question, I found a rather old sample (using a very old mongoose version) of overriding the save function to accomplish something similar and based this attempt off it.

// First, change generateId() to force a collision
let ids = ['a', 'a', 'a', 'b'];
let index = 0;
let generateId = function() {
  return ids[index  ];
};

// Configure middleware to generate the id before a save
sampleSchema.pre('validate', function(next) {
  if (this.isNew)
    this._id = generateId();
  next();
});

// Now override the save function
SampleModel.prototype.save_original = SampleModel.prototype.save;
SampleModel.prototype.save = function(options, callback) {
  let self = this;
  let retryOnDuplicate = function(err, savedDoc) {
    if (err) {
      if (err.code === 11000 && err.name === 'MongoError') {
        self.save(options, retryOnDuplicate);
        return;
      }
    }
    if (callback) {
      callback(err, savedDoc);
    }
  };
  return self.save_original(options, retryOnDuplicate);
}

This gets me close but I'm leaking a promise and I'm not sure where.

let sampleA = new SampleModel({key: 'a'});
let sampleADoc = await sampleA.save();
console.log('sampleADoc', sampleADoc); // prints undefined, but should print the document
let sampleB = new SampleModel({key: 'b'});
let sampleBDoc = await sampleB.save();
console.log('sampleBDoc', sampleBDoc); // prints undefined, but should print the document
let all = await SampleModel.find();
console.log('all', all); // prints `[]`, but should be an array of two documents

Output

sampleADoc undefined
sampleBDoc undefined
all []

The documents eventually get written to the database, but not before the console.log calls are made.

Where am I leaking a promise? Is there an easier way to do this that addresses the three problems I outlined?

Edit 1: Mongoose version: 5.11.15

CodePudding user response:

I fixed the problem by changing the save override. The full solution looks like this:

const sampleSchema = new mongoose.Schema({
  _id: String,
  color: String,
});

let generateId = function() {
  return randomStringGenerator.generate({length: 8, charset: 'hex', capitalization: 'uppercase'});
};

sampleSchema.pre('validate', function() {
  if (this.isNew)
    this._id = generateId();
});

let SampleModel = mongoose.model('Sample', sampleSchema);

SampleModel.prototype.save_original = SampleModel.prototype.save;
SampleModel.prototype.save = function(options, callback) {
  let self = this;

  let isDupKeyError = (error, field) => {
    // Determine whether the error is a duplicate key error on the given field
    return error?.code === 11000 && error?.name === 'MongoError' && error?.keyValue[field];
  }

  let saveWithRetries = (options, callback) => {
    // save() returns undefined if used with callback or a Promise otherwise.
    // https://mongoosejs.com/docs/api/document.html#document_Document-save
    let promise = self.save_original(options, callback);
    if (promise) {
      return promise.catch((error) => {
        if (isDupKeyError(error, '_id')) {
          return saveWithRetries(options, callback);
        }
        throw error;
      });
    }
  };

  let retryCallback;
  if (callback) {
    retryCallback = (error, saved, rows) => {
      if (isDupKeyError(error, '_id')) {
        saveWithRetries(options, retryCallback);
      } else {
        callback(error, saved, rows);
      }
    }
  }

  return saveWithRetries(options, retryCallback);
}

This will generate an _id repeatedly until a successful save is called and addresses the three problems outlined in the original question:

  • The minimum trips to the database has been reduced from two to one. Of course, if there are collisions, more trips will occur but that's the exceptional case.
  • This implementation takes care of generating the id itself with no manual step to take before saving. This reduces complexity and removes the required knowledge of prerequisites for saving that are present in the original method.
  • The race condition has been addressed. It won't matter if two clients attempt to use the same key. One will succeed and the other will generate a new key and save again.

To improve this:

  • There ought to be a maximum number of save attempts for a single document followed by failure. In this case, you've perhaps used up all the available keys in whatever domain you're using.
  • The unique field may not be named _id or you might have multiple fields that require a unique generated value. The embedded helper function isDupKeyError() could be updated to look for multiple keys. Then on error you could add logic to regenerate just the failed key.
  • Related