Home > Blockchain >  Indexing for GROUP BY in CosmosDB
Indexing for GROUP BY in CosmosDB

Time:09-28

As the title suggests I'm wondering how to create an effective index for GROUP BY queries in CosmosDB.

Say the documents look something like:

{
    "pk": "12345",
    "speed": 500
},
{
    "pk": "6789",
    "speed": 100
}

Doing a query to find out the SUM of the speed grouped by the partition key would look something like:

SELECT c.pk, SUM(c.speed) FROM c WHERE c.pk IN ('12345','6789') GROUP BY c.pk

With about ~1.6 million documents this query costs 1489.51 RUs. However, splitting this up into two queries such as:

SELECT SUM(c.speed) FROM c WHERE c.pk = '12345'

SELECT SUM(c.speed) FROM c WHERE c.pk = '6789'

each of them cost only ~2.8 RUs each. Obviously the results would need some post-processing compared to the GROUP BY query to match. But a total of 5.6 RUs compared to 1489 RUs makes it worth it.

The indexing on the collection is as follows:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/\"_etag\"/?"
        }
    ],
    "compositeIndexes": [
        [
            {
                "path": "/pk",
                "order": "ascending"
            },
            {
                "path": "/speed",
                "order": "ascending"
            }
        ]
    ]
}

Am I completely missing something or how can the GROUP BY be so much more expensive? Is there any indexing I can do to bring it down?

Thanks in advance!

CodePudding user response:

Currently GROUP BY does not not yet use the index.

This is currently being worked on. I would revisit sometime towards the end of the year to verify it is supported.

  • Related