new count() aggregation function performance-CodePudding

Firestore introduced the new aggregation query count() which is pretty useful for me. I'm trying to understand the speed and cost of it.

The documentation mentions performance:

Performance depends on your index configuration and on the size of the dataset.

and

Most queries scale based on the on the size of the result set, not the dataset. However, aggregation queries scale based on the size of the dataset and the number of index entries scanned.

and also the pricing:

you are charged one document read for each batch of up to 1000 index entries matched by the query.

Now imagine I have 2 collections, the first one with 100'000 documents, and the second one with 1'000'000 documents. I run a someQuery.count() on both collections and both return 1500. Here are some questions:

Will both queries be charged the same (2 document reads)?
Will the second query take longer than the first query, since the second collection has more documents? If yes - how much longer (linear, log, etc.)?
What does the documentation mean by performance depends on your index configuration? Can I configure the collections and their indexes to make count work faster?

Would be great to get some answers, I'm relying on count heavily (I know, it's still in preview).

CodePudding user response：

The charge for COUNT() queries is based on the number of index entries that are matched by that query. Essentially:

take the count you get back,
divide by 1000, rounding up,
if the result is 0, add 1.

That's the number of document reads you're charged.

The performance of a COUNT() query depends on the number of items you count. Counting more items will take longer as the database has to match a large number of index entries.

This is similar to if you'd actually retrieve the documents rather than just their count: getting more documents would also take more time. Of course: counting the documents will be faster than actually getting them, but counting more documents still takes more time than counting fewer documents.

Some queries may be possible even when you don't have a specific index for the exact field combination you use, due to Firestore's ability to perform a zig-zag-merge-join across the indexes (that feature might have a different name now, but it's pretty much still doing something similar). That reduces the number of indexes you'd need for a database, and thus reduce the associated cost. It could also affect the performance of the count, but not in a way that you can easily calculate. I recommend not trying to optimize here until you actually need it, i.e. when you notice certain queries being slower than other similar queries in your database.