Home > Net >  Pivoting data to find unique values for specific columns
Pivoting data to find unique values for specific columns

Time:06-13

I have a table:

ID col1 col2 col3 col4 col5
1 Y G CA DA EA
2 Y G CA DA EA
3 Y G CA DA EA
1 X Q RA DA EA
2 X Q CA DA EA

For col1 and col2, I want to create 4 additional columns. Two for the distinct values for col1 (X,Y) and two for the distinct values for col2 (G,Q).

For example, I want to create a new column (col1_x) to display 'Y' if col1 have X. Basically for each ID, col3, col4 and col5 combination - I want to show if col1 have X , col1 have Y, col2 have G and col2 have Q. Each as an individual column. How do I do that? I think I'll need a pivot/group by as I would want to eliminate duplicated rows of data.

Desired output:

ID col3 col4 col5 col1_X col1_Y col2_G col2_Q
1 CA DA EA Y Y Y Y
1 RA DA EA Y Y Y Y
2 CA DA EA Y Y Y Y
3 CA DA EA N Y Y N

CodePudding user response:

Test data

drop table if exists #test;
drop table if exists #result;

create table #test (
    [ID] int,
    [col1] varchar(1),
    [col2] varchar(1),
    [col3] varchar(2),
    [col4] varchar(2),
    [col5] varchar(2));

insert into #test ([ID], [col1], [col2], [col3], [col4], [col5]) 
values
(1, 'Y', 'G', 'CA', 'DA', 'EA'),
(2, 'Y', 'G', 'CA', 'DA', 'EA'),
(3, 'Y', 'Q', 'CA', 'DA', 'EA'),
(1, 'X', 'G', 'RA', 'DA', 'EA'),
(2, 'X', 'G', 'CA', 'DA', 'EA'),
(3, 'X', 'Q', 'CA', 'DA', 'EA');

1) Pivot solution

Here is a pivot approach - it is not purely dynamic as you have to provide "column names" in the select which are the values you are looking for (X/Y and G/Q) - it needs to be done twice, once for each column you are interested in (col1 and col2) so it ends up looking more complex than might be hoped for, but it is is really one simple pivot, done twice, then cleaned up per your desired output. It seems like a 'trick' to me, still - after all these years! More info on PIVOT here: Using PIVOT and UNPIVOT.

select 
    a.ID, a.col3, a.col4, a.col5, 
    case when cast(a.[X] as bit) = 1 then 'Y' else 'N' end as Col1_X, 
    case when cast(a.[Y] as bit) = 1 then 'Y' else 'N' end as col1_Y,
    case when cast(b.[G] as bit) = 1 then 'Y' else 'N' end as col2_G,
    case when cast(b.[Q] as bit) = 1 then 'Y' else 'N' end as col2_Q
from
(
    select 
        ID, col3, col4, col5, [X], [Y]
        from #test
        pivot(
            Count(col1) for [col1] in ([X], [Y])
            ) as tmp 
) a
inner join
(
    select 
        ID, col3, col4, col5, [G], [Q]
        from #test
        pivot(
            Count(col1) for [col2] in ([G], [Q])
            ) as tmp 
) b
on a.ID = b.ID
and a.col3 = b.col3
and a.col4 = b.col4
and a.col5 = b.col5

2) Naive solution

I created a "naive" solution that I think is not really so bad and has the benefit of being understandable to anyone with a little sql skill, whereas it is a little obscure how pivot is working in a case like this (where we are not "really" aggregating information such as sums, averages, or real counts).

-- create the output table
select 
distinct ID, col3, col4, col5, 
    'N' as col1_X, 'N' as col1_Y, 'N' as col2_G, 'N' as col2_Q
into #result 
from #Test;

-- update the results
update t1
set t1.col1_X = 'Y'
from #result t1
where exists (select * from #test t2 
    where t2.ID = t1.ID
    and t2.col3 = t1.col3
    and t2.col4 = t1.col4
    and t2.col5 = t1.col5 
    and t2.col1 = 'X');

update t1
set t1.col1_Y = 'Y'
from #result t1
where exists (select * from #test t2 
    where t2.ID = t1.ID
    and t2.col3 = t1.col3
    and t2.col4 = t1.col4
    and t2.col5 = t1.col5
    and t2.col1 = 'Y');

update t1
set t1.col2_G = 'Y'
from #result t1
where exists (select * from #test t2 
    where t2.ID = t1.ID
    and t2.col3 = t1.col3
    and t2.col4 = t1.col4
    and t2.col5 = t1.col5
    and t2.col2 = 'G');

update t1
set t1.col2_Q = 'Y'
from #result t1
where exists (select * from #test t2 
    where t2.ID = t1.ID
    and t2.col3 = t1.col3
    and t2.col4 = t1.col4
    and t2.col5 = t1.col5 
    and t2.col2 = 'Q');

select * from #result;

Result in both cases:

ID col3 col4 col5 col1_X col1_Y col2_G col2_Q
1 CA DA EA N Y Y N
1 RA DA EA Y N Y N
2 CA DA EA Y Y Y N
3 CA DA EA Y Y N Y

CodePudding user response:

select col3, col4, col5,
max(col1_X),
max(col1_Y),
max(col2_G),
max(col2_Q)

from 
(select col3, col4, col5,
case when col1 = 'X' then 'Y' else 'N' end as col1_X,
case when col1 = 'Y' then 'Y' else 'N' end as col1_Y,
case when col2 = 'G' then 'Y' else 'N' end as col2_G,
case when col2 = 'Q' then 'Y' else 'N' end as col1_Q

From table) query

Group by col3, col4, col5
  • Related