Performance difference while Inserting 100 Records Parallelly in MongoDb-CodePudding

I want to insert 100 or more documents into a MongoDB collection parallelly. We are using Atlas M30 servers.

PS: I cannot do a bulk insert in my use case - there are quite a few pre-processing steps before the insert.


import { MongoClient } from 'mongodb';

let mc = new MongoClient("mongodb://localhost:27017/");


const getConnection = async function () {

    let client = await mc.connect();
    let db = client.db('testpeformance');
    let collection = db.collection("candidate");
    return collection;
}
const insertToMongo = async function (collection, index) {
    const startTime = new Date();
    const result = await collection.insertOne({
        "FirstName": "TESTPERF",
        "LastName": "TESTPERF",
        "Designation": "TESTPERF",
        "EmailId": "[email protected]"
    });
    const endTime = new Date();
    var insertedId = result.insertedId;
    console.log(`Inserting record [${index}] - ${insertedId} took ${endTime.getTime() - startTime.getTime()}ms for collection `)
}




const run = async function () {
    // tasks run immediately in parallel and wait for both results
    let connection = await getConnection()
    console.time('overall');
    for (let topIndex = 0; topIndex < 1; topIndex  ) {
        let promises: Promise<any>[] = [];
        for (let index = 0; index < 100; index  ) {

            let apipromise = insertToMongo(connection, index);
            promises.push(apipromise);
        }
        await Promise.all(promises);

    }
    console.timeEnd('overall')
};
run();

When I run the same code for less than 10 documents it takes 30ms for each document to insert. But when I increase the number of documents to 100 or more, the time taken to insert each record goes up to 300-400 ms.

CodePudding user response：

I suppose you are using MongoDB's community edition on your system. Are you trying to run the same script with different parameters at the same time to enter multiple documents in parallel?

I won't be able to point to a specific issue as many other details/load testing/logs will be required to check the threshold of your system but you can take a look at some of them below:

It could be related to number of open connections available in your MongoDB Connection Pool.
Is your RAM and processor able to handle such amount of threads?

Try breaking your code into small bits like how many connections are you able to open at once, then how many inserts are you able to do at once and then carry on with the testing.

CodePudding user response：

There are 2 "main" bottlenecks to consider in this case. One is the parallelism of your code and the second is the DB bottleneck.

Let's understand your code first, while you do "batch" your work it is too inconsistent. the first few milliseconds you achieve "parallelism" as all 100 inserts are executed at once, but once they start resolving you're starting to lose this quality.

Imagine the last insert in each batch is always executing by itself which is quite limiting when you want to max out performance.

I personally like bluebird which allows me to execute multiple promises while still controlling concurrency. Here is some puesdo code:

import { map } from 'bluebird';

const insertsRequired = 5000; 
await map([...Array(insertsRequired).keys()], (each) => {
    return insertToMongo(connection, each)
}, { concurrency: 100 })

Assuming the "compute" in insertToMongo is not consuming all of your machines resources there is no need to further parallelise the code. But if it does you should deploy this code over multiple machines.

The second issue is the DB bottleneck, no matter what you do in your code there will be hardware limitations. Obviously increasing the size of the DB can help. But there is one specific limitation you need to consider.

In MongoDb each collection has a single writer, this is how they ensure ACID at a single document level. This means no matter how big the DB is this will always be the real bottleneck when it comes to mass writes into a single collection.

This means you also need to consider insert performance, specifically Indexes in the collection - try to have very narrow indexes and avoid array/unique indexes as much as possible.

The scale of inserts you're describing should not impose such limitations but it is good to keep them in mind.