Home > Back-end >  How expensive is it to "index everything" on CosmosDB? (default behaviour)
How expensive is it to "index everything" on CosmosDB? (default behaviour)

Time:11-11

By default, all the data in Azure Cosmos DB is indexed i.e. every property inside document/item has by default consistent/automatic indexing. However, the cost of storing this large amount of data is not clearly visible to the end users. How would you calculate/track the cost you are bearing for storing or using the index related data?

As the billing is only related to RU/s and data storage, it is not clear how the indexing strategy affects billing. Also i wonder if the RU/s necessary for intensive writes maybe increased because of the indexes. If so, indexes in CosmosDB should be excluded, and only necessary properties have to be indexed, thus reducing the overall performance.

CodePudding user response:

As the billing is only related to RU/s and data storage, it is not clear how the indexing strategy affects billing.

Indexing strategy will affect billing because if you index everything, you will consume more storage and that in turn will increase your bill. When an item is written to Cosmos DB, a part of your RU/s will be spent on indexing that item thus you will end up consuming more RU/s which will increase your bill.

You may find these links helpful in optimizing the costs as far as indexing is concerned:

Also i wonder if the RU/s necessary for intensive writes maybe increased because of the indexes.

That is correct.

If so, indexes in CosmosDB should be excluded, and only necessary properties have to be indexed, thus reducing the overall performance.

For bulk writes, it is recommended that you turn off indexing completely before doing the bulk writes and enable it once the write operations are completed. You can also request for Lazy Indexing as described here.

UPDATE - Adding comment from @MarkBrown (part of CosmosDB team):

I don't recommend using lazy indexing as there is no way to know when it is caught up and the data can be queried consistently. Also, a well-tuned indexing policy can be fine in bulk scenarios. Generally best reserved for massive one-time data loads. Not recommended for regular, batch type data ingestion.

  • Related