In my scenario, I have a CosmosDB and an Azure Data Explorer. The RAW data write to the CosmosDB and Azure Data Explorer updated with an Azure Function (Cosmos trigger). We use ADX for running complex queries and analyses. I have another app to deliver raw data to the customer via API. Now, My question is which one is suitable for bulk and normal reading and providing raw data to the customer? ( This option is available to read both of them )
Which viewpoints should I consider?
CodePudding user response:
Cosmos DB is designed for operational workloads -
Serve a large number of small requests (reads/writes).
For retrieval of few rows/docs - Cosmos DB.
For millions of rows/docs and above (which seems to be your use-case) - definitely ADX.
CodePudding user response:
When you say "providing raw data to customer", do you mean loading 1K Rows to display on a web page / Power BI or essentially download a fraction of the database for the customer to then import in the tool of their choice?
If it's the latter, I would recommend considering exporting the data from ADX using continuous export (cf https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/data-export/continuous-data-export) or using data share (cf https://docs.microsoft.com/en-us/azure/data-explorer/data-share).
If it's the former, i.e. reading a big chunk to display ad hoc (e.g. without warning, I want all rows where X==x, right now), then both can do it. Challenge with Cosmos DB is that it's going to create an RU spike in your consumption, so depending on your deployment model, that could disrupt the rest of the workload. You would likely need to do a cross-partition query (unless the customer happen to request data from a single partition) in which case it tends to be a bit less efficient.
With ADX you're limited to 500K records by default but can bump that up if need be. This isn't without disruption on ADX either, since that request will hug a "request slot" while it's happening. If you have many of those concurrently that could become a challenge, so be careful with that.
FYI, we are coming with a connector for Cosmos DB that allows you to ingest data coming to Cosmos DB into ADX in near real time (very similar to the Event Hub managed pipeline). It is in a limited private preview right now. That can likely simplify your architecture in the future.