I've tried both methods (cursor and offset) to paginate through my ~1000 doc DB, however, I can't get cursor to be faster than offset.
// cursor method
const allDocs = []
let queryResults = await prismaClient.obsidian.findMany({
take: 100
})
while (queryResults.length > 0) {
allDocs.push(...queryResults)
queryResults = await prismaClient.obsidian.findMany({
take: 100,
skip: 1,
cursor: {
id: queryResults[queryResults.length - 1]!.id
}
})
}
return allDocs
takes ~25 seconds to complete (NB: also it seems to be using OFFSET
, I thought it wasn't supposed to do that?)
Query: SELECT "public"."obsidian"."id", "public"."obsidian"."doc", "public"."obsidian"."createdAt", "public"."obsidian"."updatedAt", "public"."obsidian"."filename" FROM "public"."obsidian", (SELECT "public"."obsidian"."id" AS "obsidian_id_0" FROM "public"."obsidian" WHERE ("public"."obsidian"."id") = ($1)) AS "order_cmp" WHERE "public"."obsidian"."id" >= "order_cmp"."obsidian_id_0" ORDER BY "public"."obsidian"."id" ASC LIMIT $2 **OFFSET** $3
// offset method
const count = await prismaClient.obsidian.count()
const pages = Math.ceil(count / 100)
const allDocsPromises = []
for (let i = 0; i < pages; i ) {
const page = prismaClient.obsidian.findMany({
skip: i * 100,
take: 100
})
allDocsPromises.push(page)
}
const allDocs = await Promise.all(allDocsPromises)
return _.flatten(allDocs)
takes ~14 seconds to complete. My thought is that offset
method is running faster due to Promise.all()
but is there a way to get cursor pagination to return all the documents super fast?
CodePudding user response:
First, cursor based pagination does not use OFFSET, in the doc mentions that.
Second, your thought about the Promise.all
is ok. Use Promise.all
for parallel processing and use for
for sequential processing (I think that the problem lies with sequential processing).
Finally, here you go a post about the benefits of cursor based pagination, enjoy it :)
CodePudding user response:
What Prisma calls cursor pagination, others call 'key set' pagination. There is no actual cursor involved. But how efficient that is is going to be depends on the rest of the query, and on having the right index(es), neither of which we can assess with the info provided. Since you are reading all the data into memory on a tight loop anyway, why not just read it and omit the pagination?