Home > Blockchain >  Distinct Counts in a Window Function
Distinct Counts in a Window Function

Time:05-09

I'm trying to get a distinct count of rows within a window function that has multiple levels of partitioning. Below is a sample of my data.

PRODUCT_ID KEY_ID STORECLUSTER
1000078 120 LLNY
1000078 202 LLF
1000078 202 LLNY
1000078 202 LLNY

I want to look at each PRODUCT_ID and then each unique KEY_ID and determine how many unique STORECLUSTERS there are per KEY_ID. For example PRODUCT_ID 1000078 has two unique KEY_ID's (120 and 202) of which 120 has 1 unique STORECLUSTER and 202 has 2 unqiue STORECLUSTER's. I've tried using a RANK() and DENSE_RANK() but I can't seem to get the partitioning correct. I would like to get a table that looks like this:

PRODUCT_ID KEY_ID STORECLUSTER STORECLUSTER_COUNT
1000078 120 LLNY 1
1000078 202 LLF 2
1000078 202 LLNY 2
1000078 202 LLNY 2

CodePudding user response:

Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function.

So you need to nest window functions. I find the simplest and most efficient method is MAX over a DENSE_RANK, but there are others.

The partitioning clause is the equivalent of GROUP BY in a normal aggregate, then the value you are DISTINCTing goes in the ORDER BY of the DENSE_RANK. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.

SELECT
  PRODUCT_ID,
  KEY_ID,
  STORECLUSTER,
  STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
    SELECT *,
      rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
    FROM YourTable t
) t;

db<>fiddle

  • Related