The documentation on ADX partitioning policy(https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/partitioningpolicy#the-data-partitioning-process) mentions that you need to set a MaxPartitionCount
while using a hash partition key. It also states that this value should be in the range (1,2048]
and recommends starting with 128
.
Question: If I have a column with a cardinality of 100,000. Shouldn't the max-partition count be 100,000? Shouldn't ADX create a partition for each distinct value in the column? Why is it even required to fill out this property MaxPartitionCount
?
CodePudding user response:
In recommended scenarios (detailed in the doc you've linked to) - The end goal isn't to have a separate partition for each distinct value of the partition key.
Having an extreme number of partitions (100k in your question, or billions in case of a unique device ID) may result with an extreme amount of small data shards, which would be sub-optimal.
Even with "only" 128 as the max partition count, alongside default built-in indexing (regardless of explicit data partitioning) - the ability to narrow the full data set down very significantly at query planning time to a small number of partitions/shards can result with significant reduction in resources utilization and execution time.
For further reading: kusto.blog.
Generally, not following the guidelines and recommendations in the documentation isn't likely to lead you to optimal results.