Home > Back-end >  Creating a LARGE matrix of values using a PIVOT function
Creating a LARGE matrix of values using a PIVOT function

Time:11-09

In this example, I have a large number of IDs (1.6m ), with a 0/1 value in each of n rules.

Example Table:

| ID      | Rule 1 | Rule 2 | Rule 3 | Rule 4 |.....
| :-----: | :----: | :----: | :----: | :----: |
| 12721   | 1      | 1      | 0      | 1      |  
| 12722   | 1      | 1      | 1      | 1      | 
| 12723   | 0      | 1      | 0      | 0      |
| 12724   | 0      | 1      | 0      | 0      |
| 12725   | 0      | 0      | 0      | 0      |
| 12726   | 1      | 1      | 0      | 1      |
| 12727   | 0      | 1      | 1      | 1      |
.......

I'm trying to create a 'relationship matrix' (I don't know the proper name for it) that how often two columns share the same value.

Desired result:

| ID      | Rule 1 | Rule 2 | Rule 3 | Rule 4 |.....
| :-----: | :----: | :----: | :----: | :----: |
| Rule 1  | -      | 3      | 1      | 3      |
| Rule 2  | 3      | -      | 2      | 4      |
| Rule 3  | 1      | 2      | -      | 2      |
| Rule 4  | 3      | 4      | 2      | -      |
.....

I understand that I'll need to use some sort of PIVOT function, but I don't know the exact approach I need to even start.

I already have;

SELECT COUNT(*), SUM(Rule_1), SUM(Rule_2), SUM(Rule_3), SUM(Rule_4)...
FROM TABLE

for another requirement of the piece I'm working on

CodePudding user response:

Sequence of UNPIVOT, JOIN the UNPIVOT-ed data with itself on the conditions, COUNT by grouping on the 2 columns contains the rules, and then PIVOT to get the final result: (I tested in ORACLE but it's ANSI SQL so it should work as is in SQLServer, I just removed the "from dual")

WITH data(id, r1, r2, r3, r4) AS (
    SELECT 12721, 1, 1, 0, 1 UNION ALL
    SELECT 12722, 1, 1, 1, 1 UNION ALL 
    SELECT 12723, 0, 1, 0, 0 UNION ALL
    SELECT 12724, 0, 1, 0, 0 UNION ALL
    SELECT 12725, 0, 0, 0, 0 UNION ALL
    SELECT 12726, 1, 1, 0, 1 UNION ALL
    SELECT 12727, 0, 1, 1, 1 
),
udata AS (
    SELECT * FROM data
    UNPIVOT (
        val for col in (r1, r2, r3, r4)
    )
),
sdata AS (
    SELECT s1.id, s1.col as col1, s2.col as col2, s1.val
    FROM udata s1
    JOIN udata s2 ON s1.id = s2.id AND s1.val = s2.val AND s1.col <> s2.col AND s1.val = 1
),
cdata AS (
    SELECT col1, col2, COUNT(*) AS c 
    FROM sdata
    GROUP BY col1, col2

)
SELECT * FROM cdata
PIVOT (
    MAX(c) FOR col2 IN ('R1' as r1, 'R2' as r2, 'R3' as r3, 'R4' as r4)
)
ORDER by col1
;


R1      3   1   3
R2  3       2   4
R3  1   2       2
R4  3   4   2   
  • Related