In this example, I have a large number of IDs (1.6m ), with a 0/1 value in each of n rules.
Example Table:
| ID | Rule 1 | Rule 2 | Rule 3 | Rule 4 |.....
| :-----: | :----: | :----: | :----: | :----: |
| 12721 | 1 | 1 | 0 | 1 |
| 12722 | 1 | 1 | 1 | 1 |
| 12723 | 0 | 1 | 0 | 0 |
| 12724 | 0 | 1 | 0 | 0 |
| 12725 | 0 | 0 | 0 | 0 |
| 12726 | 1 | 1 | 0 | 1 |
| 12727 | 0 | 1 | 1 | 1 |
.......
I'm trying to create a 'relationship matrix' (I don't know the proper name for it) that how often two columns share the same value.
Desired result:
| ID | Rule 1 | Rule 2 | Rule 3 | Rule 4 |.....
| :-----: | :----: | :----: | :----: | :----: |
| Rule 1 | - | 3 | 1 | 3 |
| Rule 2 | 3 | - | 2 | 4 |
| Rule 3 | 1 | 2 | - | 2 |
| Rule 4 | 3 | 4 | 2 | - |
.....
I understand that I'll need to use some sort of PIVOT function, but I don't know the exact approach I need to even start.
I already have;
SELECT COUNT(*), SUM(Rule_1), SUM(Rule_2), SUM(Rule_3), SUM(Rule_4)...
FROM TABLE
for another requirement of the piece I'm working on
CodePudding user response:
Sequence of UNPIVOT, JOIN the UNPIVOT-ed data with itself on the conditions, COUNT by grouping on the 2 columns contains the rules, and then PIVOT to get the final result: (I tested in ORACLE but it's ANSI SQL so it should work as is in SQLServer, I just removed the "from dual")
WITH data(id, r1, r2, r3, r4) AS (
SELECT 12721, 1, 1, 0, 1 UNION ALL
SELECT 12722, 1, 1, 1, 1 UNION ALL
SELECT 12723, 0, 1, 0, 0 UNION ALL
SELECT 12724, 0, 1, 0, 0 UNION ALL
SELECT 12725, 0, 0, 0, 0 UNION ALL
SELECT 12726, 1, 1, 0, 1 UNION ALL
SELECT 12727, 0, 1, 1, 1
),
udata AS (
SELECT * FROM data
UNPIVOT (
val for col in (r1, r2, r3, r4)
)
),
sdata AS (
SELECT s1.id, s1.col as col1, s2.col as col2, s1.val
FROM udata s1
JOIN udata s2 ON s1.id = s2.id AND s1.val = s2.val AND s1.col <> s2.col AND s1.val = 1
),
cdata AS (
SELECT col1, col2, COUNT(*) AS c
FROM sdata
GROUP BY col1, col2
)
SELECT * FROM cdata
PIVOT (
MAX(c) FOR col2 IN ('R1' as r1, 'R2' as r2, 'R3' as r3, 'R4' as r4)
)
ORDER by col1
;
R1 3 1 3
R2 3 2 4
R3 1 2 2
R4 3 4 2