As the title suggests I'm wondering how to create an effective index for GROUP BY queries in CosmosDB.
Say the documents look something like:
{
"pk": "12345",
"speed": 500
},
{
"pk": "6789",
"speed": 100
}
Doing a query to find out the SUM of the speed grouped by the partition key would look something like:
SELECT c.pk, SUM(c.speed) FROM c WHERE c.pk IN ('12345','6789') GROUP BY c.pk
With about ~1.6 million documents this query costs 1489.51 RUs. However, splitting this up into two queries such as:
SELECT SUM(c.speed) FROM c WHERE c.pk = '12345'
SELECT SUM(c.speed) FROM c WHERE c.pk = '6789'
each of them cost only ~2.8 RUs each. Obviously the results would need some post-processing compared to the GROUP BY query to match. But a total of 5.6 RUs compared to 1489 RUs makes it worth it.
The indexing on the collection is as follows:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
[
{
"path": "/pk",
"order": "ascending"
},
{
"path": "/speed",
"order": "ascending"
}
]
]
}
Am I completely missing something or how can the GROUP BY be so much more expensive? Is there any indexing I can do to bring it down?
Thanks in advance!
CodePudding user response:
Currently GROUP BY does not not yet use the index.
This is currently being worked on. I would revisit sometime towards the end of the year to verify it is supported.