Azure Data Explorer partitioning policy-CodePudding

The documentation on ADX partitioning policy(https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/partitioningpolicy#the-data-partitioning-process) mentions that you need to set a MaxPartitionCount while using a hash partition key. It also states that this value should be in the range (1,2048] and recommends starting with 128.

Question: If I have a column with a cardinality of 100,000. Shouldn't the max-partition count be 100,000? Shouldn't ADX create a partition for each distinct value in the column? Why is it even required to fill out this property MaxPartitionCount?

CodePudding user response：

In recommended scenarios (detailed in the doc you've linked to) - The end goal isn't to have a separate partition for each distinct value of the partition key.

Having an extreme number of partitions (100k in your question, or billions in case of a unique device ID) may result with an extreme amount of small data shards, which would be sub-optimal.
Even with "only" 128 as the max partition count, alongside default built-in indexing (regardless of explicit data partitioning) - the ability to narrow the full data set down very significantly at query planning time to a small number of partitions/shards can result with significant reduction in resources utilization and execution time.

For further reading: kusto.blog.

Generally, not following the guidelines and recommendations in the documentation isn't likely to lead you to optimal results.