I just realized that some of the tables which I moved from parquet to cosmos db, have pretty big size, as obviously there is not the same level of compression like in parquet. That is obviously resulting in big cost. Eventually RUs don't cost me much, but storage is a bit high. Any good recommendations how to reduce the size of collections in Cosmos db. Apart from the excluding not needed fields and indexes?
CodePudding user response:
Cosmos DB is not designed to be a cold store for massive amounts of data that isn't actively queried. If you have huge amounts of data that is infrequently queried, one suggestion would be to enable Synapse Link and let it write that data from Cosmos DB into analytical storage on a remote blob store in parquet format. With your data in analytical store, you can then TTL the data from Cosmos DB that you are not actively using and querying for OLTP operations.
If you need to query the older data, you can provision a new Workspace and Notebooks and use SQL or Spark to query the data. If you don't need to query it then you can just let the data remain there. Best of all the storage costs are the same as regular blob storage, definitely less expensive than the price for storage in Cosmos DB which is .25c/GB due it being on cluster SSD storage.