When I use mongodb query with $in
, when the items length is small, it is quick. But when I query with large items it is slow.
How to improve performance?
For example:
I have 2 collections:
- Books
for(var i = 0 ; i < 1000000; i ) {
db.books.insert({name: "book" i})
}
- Users. Let us say, user1 can only check books where _id in a bucket, so when we find user1's book list, we will query like this:
// a. find all books ids that user1 can see,
books: db.user_access.findOne({uid: "user1"},{books: 1});
// books: ["id1","id2","id3" ... "idN"]
// b. query with the access.
db.books.find({$in: {_id: books}}).limit(10).skip(0) // pagination.
we query 10
docs from a large collection with a complex condition.
How can we improve performance, from the design or the queries?
CodePudding user response:
Limit and skip book list in your application before you send query to database. Send only book1~book10 first, then book11~book12, etc.
Use Keyset Pagination, tell what the last fetched item is and get the next page after that item.
CodePudding user response:
I've got working code, and will give you some explanation:
const { MongoClient } = require('mongodb')
const uri = 'mongodb://127.0.0.1:27017/';
const dbName = 'stackoverflow_q69548755';
const client = new MongoClient(uri);
async function run(callback) {
try {
await client.connect();
console.log('Connected successfully to server');
await callback();
} finally {
await client.close();
console.log('Connection closed')
}
}
async function runOperation() {
const db = client.db(dbName);
// users and user_books (from a cart) should go in different collections
// ! collections can be accessed later via api like: db.user_access..., db.books...
const usersCollection = db.collection('user_access'); // user details with books assigned
const booksCollection = db.collection('books');
const books = [];
const userBooks = [];
const totalBooks = 1000000;
const totalUserBooks = 10;
// * create user
const userDocument = await usersCollection.insertOne({ uid: 'user1' });
console.log(`User ${userDocument.insertedId} created`)
// * create books
for (let i = 0 ; i < totalBooks; i ) {
books.push({ name: "book" i });
}
const bookDocuments = await booksCollection.insertMany(books);
console.log(`${books.length} books created`)
// * assign a few books to the user
const bookDocumentIds = Object.values(bookDocuments.insertedIds);
for (let i = 0; i < totalUserBooks; i ) {
userBooks.push(bookDocumentIds[pickRandomBook(totalBooks)]);
}
await usersCollection.updateOne(
{ _id: userDocument.insertedId },
{ $set: { books: userBooks }}
);
console.log(`User took ${totalUserBooks} books randomly from ${totalBooks} in the collection:\n\t${userBooks.join(', ')}`)
// * find user books from books collection
// we know which books are from the user!
const bookFilteredDocuments = await booksCollection.find(
{ _id: { $in: userBooks } }
).skip(0).limit(totalUserBooks).toArray();
const bookFilteredDocumentIds = bookFilteredDocuments.map(document => document._id);
console.warn(`Books filtered: \n\t${bookFilteredDocumentIds.join(', ')}`)
console.log('Done');
}
function pickRandomBook(maxBooks) {
// expected outputs are [0, 1, ..., maxBooks - 1]
return Math.floor(Math.random() * maxBooks);
}
run(runOperation).catch(console.dir);
I suppose you're using MongoClient
to access your collections. I left some comments in the code, but will write down here a few things also:
when using
find()
filter,filter
parameter expects property name, then operator, such as{_id: {$in: books}}
instead of{$in: {_id: books}}
when using pagination, the order of operations is:
- sort
- skip
- limit
You swapped the order of skip and limit operations.
Don't know why the query at your site is so slow, but I'm running it locally on my machine with a local database and I got the result within a second or two. I suggest you analyze my code. I think the main problem is in the first comment that I pointed out. If your query is too slow, such as much more than a second or two can you be more specific, please? - If my solution helped you, write down what exactly helped you.