Home > Back-end >  Azure Data Explorer partitioning policy
Azure Data Explorer partitioning policy

Time:12-20

The documentation on ADX partitioning policy(https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/partitioningpolicy#the-data-partitioning-process) mentions that you need to set a MaxPartitionCount while using a hash partition key. It also states that this value should be in the range (1,2048] and recommends starting with 128.

Question: If I have a column with a cardinality of 100,000. Shouldn't the max-partition count be 100,000? Shouldn't ADX create a partition for each distinct value in the column? Why is it even required to fill out this property MaxPartitionCount?

CodePudding user response:

In recommended scenarios (detailed in the doc you've linked to) - The end goal isn't to have a separate partition for each distinct value of the partition key.

  • Having an extreme number of partitions (100k in your question, or billions in case of a unique device ID) may result with an extreme amount of small data shards, which would be sub-optimal.

  • Even with "only" 128 as the max partition count, alongside default built-in indexing (regardless of explicit data partitioning) - the ability to narrow the full data set down very significantly at query planning time to a small number of partitions/shards can result with significant reduction in resources utilization and execution time.

For further reading: kusto.blog.

Generally, not following the guidelines and recommendations in the documentation isn't likely to lead you to optimal results.

  • Related