Home > database >  TSQL - Number groups based on distinct values in certain columns
TSQL - Number groups based on distinct values in certain columns

Time:11-10

Let's say I have a table like this:

| ID  | ColA | ColB | ColC | ... |
|-----|------|------|------|-----|
| 1   | 111  | XXX  | foo  |     |
| 1   | 111  | XXX  | bar  |     |
| ... | ...  | ...  | ...  |     |
| 1   | 111  | YYY  | foo  |     |
| 1   | 111  | YYY  | bar  |     |
| ... | ...  | ...  | ...  |     |
| 1   | 999  | XXX  | foo  |     |
| 1   | 999  | XXX  | bar  |     |
| ... | ...  | ...  | ...  |     |
| 1   | 999  | YYY  | foo  |     |
| 1   | 999  | YYY  | bar  |     |
| ... | ...  | ...  | ...  |     |
| 2   | 111  | XXX  | foo  |     |
| 2   | 111  | XXX  | bar  |     |
| ... | ...  | ...  | ...  |     |

There are further columns to the right with all sorts of other values.

I want to partition this table in T-SQL into distinct groups only by columns "ID", "ColA" and "ColB", without regard to all other columns. Then I want to sequentially number those groups. My final result should look like this:


| ID  | ColA | ColB | ColC | ... | GroupNumber |
|-----|------|------|------|-----|-------------|
| 1   | 111  | XXX  | foo  |     | 1           |
| 1   | 111  | XXX  | bar  |     | 1           |
| ... | ...  | ...  | ...  |     | ...         |
| 1   | 111  | YYY  | foo  |     | 2           |
| 1   | 111  | YYY  | bar  |     | 2           |
| ... | ...  | ...  | ...  |     | ...         |
| 1   | 999  | XXX  | foo  |     | 3           |
| 1   | 999  | XXX  | bar  |     | 3           |
| ... | ...  | ...  | ...  |     | ...         |
| 1   | 999  | YYY  | foo  |     | 4           |
| 1   | 999  | YYY  | bar  |     | 4           |
| ... | ...  | ...  | ...  |     | ...         |
| 2   | 111  | XXX  | foo  |     | 5           |
| 2   | 111  | XXX  | bar  |     | 5           |
| ... | ...  | ...  | ...  |     | ...         |

It seems like this should be an easy problem but I struggle to get a handle on it. I have a certain suspicion that this should work somehow with DENSE_RANK and the partitioning clause in that function. My approach is:

SELECT
    *,
    DENSE_RANK() OVER(
        PARTITION BY ID, ColA, ColB
        ORDER BY ColC
    ) AS GroupNumber
FROM my_table

but this keeps increasing the GroupNumber within each one of these blocks as well.

CodePudding user response:

If I'm understanding what you're looking for, you have the right idea, however you don't need to partition the data within the ranking function - you're looking for the rank of the combination of columns Id, ColA, and ColB within the entire dataset, not the rank of records within those combination of columns.

If that's the case, you simply would remove your partition clause in your dense_rank(), like this:

SELECT
    *,
    DENSE_RANK() OVER(ORDER BY ID, ColA, ColB) AS GroupNumber
FROM my_table

That assumes that you aren't trying to assign group #'s in any specific order other than the order of ID, ColA, and ColB, which I think is what you want, however you also used an "ORDER BY ColC" clause in your original example - I'm guessing you did that because you need to add an order by clause to a ranking function.

If you are however trying to order the groups a different way, would need to know that and would require something a little different.

  • Related