I have created a composite index for these fields:
'DayPublished': Ascending, 'MonthPublished': Ascending, 'IsExperimentalFeaturesEnabled': Ascending, 'NumPlays': Descending, 'DatePublished': Descending
But for some reason, Firestore won't let me do a query that only has a subset of these fields, eg, If I do a query with just 'IsExperimentalFeaturesEnabled' = true
and sort by 'NumPlays' descending
, Firestore won't use the composite query as described above, but instead link me to generate a new index which is:
'IsExperimentalFeaturesEnabled': Ascending, 'NumPlays': Descending
Why is that? I'm not too sure about the inner workings of indexes, but to me, it seems like subset queries should be able to use the data of their superset indexes.
Also, another quick related question, how does having more fields in an index affect storage used by that index? Is it linear, or ^N 1 for every new field added?
CodePudding user response:
it seems like subset queries should be able to use the data of their superset indexes?
No, that's not the way it works. An index is going to use all of the indexed fields to create a total ordering of all eligible documents using those fields, for queries that use all of those fields. That's what makes query results fast and scalable - it just has to find a range of documents that are easy for the index to find. If you don't use one of the fields in your query, then the total ordering using the full set of fields no longer holds, and the index fails to be fast at scale. That's why you need a new index using only the fields used in the query.
Your second question is answered by the documentation. Additional fields do cost additional storage for the index.
CodePudding user response:
I don't think I can provide a better answer than Doug's, but here are my thoughts. In my opinion, things are very simple. By default, Firestore automatically creates a single-field index for each field that exists in a document. So it doesn't matter what field you choose to query on, there is nothing you should explicitly do.
If such simple queries require an index, it's clear that more complex queries require an index too. Why? Because Firestore needs that in order to massively scale. Because nobody on earth knows ahead of time what queries you need to perform in your application, the indexes that correspond to your particular queries will remain your responsibility to create them. Besides that, Firestore cannot create all indexes, for all combinations in advanced. Why? Because of the large number of possible field combinations. Besides that, each index count towards the index entry count limit.
So when we create an index for a query that returns ordered results based on five fields, then that index is unique for that combination of fields. That being said, as you already notices, if you perform a query based on a combination of two fields, for example, it will not work. If you need such a query, a new index is required. And it makes sense since each query needs a unique index.
So in order for Firestore to guarantee high query performance, those constraints are needed. Remember that a composite index stores a sorted mapping of all the documents in a collection, based on an ordered list of fields to index.
Everything in Firestore is about scaling, and we cannot have very fast queries without these constraints. However, when we want to perform queries with multiple combinations of fields, we might encounter a problem. The problem is that in such cases, we can reach the maximum limitation very fast.
As said, although Firestore uses an index for every query, it does not necessarily require one index per query. According to the official documentation:
For queries with multiple equality (
==
) clauses and, optionally, anorderBy
clause, Cloud Firestore can reuse existing indexes. Cloud Firestore can merge the indexes for simple equality filters to build the composite indexes needed for larger equality queries.
So I think you should take advantage of index merging in order to reduce indexing costs.