I'm trying to get a distinct count of rows within a window function that has multiple levels of partitioning. Below is a sample of my data.
PRODUCT_ID | KEY_ID | STORECLUSTER |
---|---|---|
1000078 | 120 | LLNY |
1000078 | 202 | LLF |
1000078 | 202 | LLNY |
1000078 | 202 | LLNY |
I want to look at each PRODUCT_ID
and then each unique KEY_ID
and determine how many unique STORECLUSTERS
there are per KEY_ID
. For example PRODUCT_ID
1000078 has two unique KEY_ID
's (120 and 202) of which 120 has 1 unique STORECLUSTER
and 202 has 2 unqiue STORECLUSTER
's. I've tried using a RANK()
and DENSE_RANK()
but I can't seem to get the partitioning correct. I would like to get a table that looks like this:
PRODUCT_ID | KEY_ID | STORECLUSTER | STORECLUSTER_COUNT |
---|---|---|---|
1000078 | 120 | LLNY | 1 |
1000078 | 202 | LLF | 2 |
1000078 | 202 | LLNY | 2 |
1000078 | 202 | LLNY | 2 |
CodePudding user response:
Unfortunately, SQL Server does not support COUNT(DISTINCT
as a window function.
So you need to nest window functions. I find the simplest and most efficient method is MAX
over a DENSE_RANK
, but there are others.
The partitioning clause is the equivalent of GROUP BY
in a normal aggregate, then the value you are DISTINCT
ing goes in the ORDER BY
of the DENSE_RANK
. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.
SELECT
PRODUCT_ID,
KEY_ID,
STORECLUSTER,
STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
FROM YourTable t
) t;