Home > Back-end >  How to raise performance of $in in mongodb?
How to raise performance of $in in mongodb?

Time:10-14

When I use mongodb query with $in, when the items length is small, it is quick. But when I query with large items it is slow.

How to improve performance?

For example:

I have 2 collections:

  1. Books
for(var i = 0 ; i < 1000000; i   ) {
    db.books.insert({name: "book"   i})
}
  1. Users. Let us say, user1 can only check books where _id in a bucket, so when we find user1's book list, we will query like this:
// a. find all books ids that user1 can see,

books: db.user_access.findOne({uid: "user1"},{books: 1}); 

// books: ["id1","id2","id3" ... "idN"]

// b. query with the  access. 
db.books.find({$in: {_id: books}}).limit(10).skip(0) // pagination. 

we query 10 docs from a large collection with a complex condition. How can we improve performance, from the design or the queries?

CodePudding user response:

  • Limit and skip book list in your application before you send query to database. Send only book1~book10 first, then book11~book12, etc.

  • Use Keyset Pagination, tell what the last fetched item is and get the next page after that item.

CodePudding user response:

I've got working code, and will give you some explanation:

const { MongoClient } = require('mongodb')

const uri = 'mongodb://127.0.0.1:27017/';
const dbName = 'stackoverflow_q69548755';
const client = new MongoClient(uri);

async function run(callback) {
  try {
    await client.connect();
    console.log('Connected successfully to server');

    await callback();
  } finally {
    await client.close();
    console.log('Connection closed')
  }
}

async function runOperation() {
  const db = client.db(dbName);

  // users and user_books (from a cart) should go in different collections
  // ! collections can be accessed later via api like: db.user_access..., db.books...
  const usersCollection = db.collection('user_access'); // user details with books assigned
  const booksCollection = db.collection('books');
  const books = [];
  const userBooks = [];
  const totalBooks = 1000000;
  const totalUserBooks = 10;

  // * create user
  const userDocument = await usersCollection.insertOne({ uid: 'user1' });
  console.log(`User ${userDocument.insertedId} created`)

  // * create books
  for (let i = 0 ; i < totalBooks; i  ) {
    books.push({ name: "book"   i });
  }
  const bookDocuments = await booksCollection.insertMany(books);
  console.log(`${books.length} books created`)

  // * assign a few books to the user
  const bookDocumentIds = Object.values(bookDocuments.insertedIds);
  for (let i = 0; i < totalUserBooks; i  ) {
    userBooks.push(bookDocumentIds[pickRandomBook(totalBooks)]);
  }
  await usersCollection.updateOne(
    { _id: userDocument.insertedId },
    { $set: { books: userBooks }}
    );
  console.log(`User took ${totalUserBooks} books randomly from ${totalBooks} in the collection:\n\t${userBooks.join(', ')}`)

  // * find user books from books collection
  // we know which books are from the user!
  const bookFilteredDocuments = await booksCollection.find(
    { _id: { $in: userBooks } }
    ).skip(0).limit(totalUserBooks).toArray();
  const bookFilteredDocumentIds = bookFilteredDocuments.map(document => document._id);
  console.warn(`Books filtered: \n\t${bookFilteredDocumentIds.join(', ')}`)

  console.log('Done');
}

function pickRandomBook(maxBooks) {
  // expected outputs are [0, 1, ..., maxBooks - 1]
  return Math.floor(Math.random() * maxBooks);
}

run(runOperation).catch(console.dir);

I suppose you're using MongoClient to access your collections. I left some comments in the code, but will write down here a few things also:

  • when using find() filter, filter parameter expects property name, then operator, such as {_id: {$in: books}} instead of {$in: {_id: books}}

  • when using pagination, the order of operations is:

    1. sort
    2. skip
    3. limit

    You swapped the order of skip and limit operations.

Don't know why the query at your site is so slow, but I'm running it locally on my machine with a local database and I got the result within a second or two. I suggest you analyze my code. I think the main problem is in the first comment that I pointed out. If your query is too slow, such as much more than a second or two can you be more specific, please? - If my solution helped you, write down what exactly helped you.

  • Related