Compare 3 columns in same table for a group of rows-CodePudding

I have to compare values in 3 columns of the same table for every group based on id. It's better to illustrate what I am trying to achieve :

create table test1 (id nvarchar(8), Name1 nvarchar(10), current_sem nvarchar(10), next_sem nvarchar(10), prev_sem nvarchar(10))

INSERT INTO test1 VALUES ('R001', 'Michael', 'Physics', 'Maths', 'Physics')
INSERT INTO test1 VALUES ('R001', 'Michael', 'Physics', 'Maths', 'Chemistry')
INSERT INTO test1 VALUES ('R003', 'Tim', 'Physics', 'Maths', 'Maths')
INSERT INTO test1 VALUES ('R002', 'John', 'Physics', 'Maths', 'Commerce')
INSERT INTO test1 VALUES ('R003', 'Tim', 'Maths', 'Maths', 'Physics')
INSERT INTO test1 VALUES ('R002', 'John', 'Maths', 'Commerce', 'Physics')
INSERT INTO test1 VALUES ('R002', 'John', 'Commerce', 'Physics', 'Maths')
INSERT INTO test1 VALUES ('R003', 'Tim', 'History', 'Civics', 'HomeEc')
INSERT INTO test1 VALUES ('R003', 'Tim', 'Drama', 'Chemistry', 'HomeEc')

The last 3 columns in the table are : current_sem, next_sem, and prev_sem.

For each id I want to find the values of next_sem which is or are not present in either current_sem or prev_sum and display under a pseudo column named "Sub_to_add".

And for the same id there is another pseudo_column named "Sub_to_remove" which contains value of current_sem which is not present in either next_sem or prev_sem. So if we see the above data id R001 (Michael) will have Maths as sub_to_add because Maths is not present in current_sem or prev_sem for R001.

R002 should ideally not be displayed because all the next_sem values are present in either current_sem or prev_sem. Similarly nothing to sub_to_remove.

R003 will have Civics;Chemistry under sub_to_add and History:Drama under sub_to_remove.

I am not sure how to show the output in code, so I am adding a screenshot.

I cannot use (next_sem <> current_sem OR next_sem <> prev_sem) in the WHERE clause because a value can be present in another row for same column - example of R002.

id      |   Name1   |   Sub_to_add          | Sub_to_remove
--------------------------------------------------------------
R001    |   Michael |   Maths               |
R003    |   Tim     |   Civics:Chemistry    | History;Drama

So, I am seeking help here. This query will be used on SQL Server 2019 v15.

CodePudding user response：

Thank you for this reminder of how grateful I am to be working with PostgreSQL instead of SQL Server nowadays. This was downright painful mostly because of your data model but also because of SQL Server's limitations.

I normalized your data, and that simplified calculating the necessary changes.

with norm as (
  select id, Name1, 'current' as sem, current_sem as subj from test1
  union
  select id, Name1, 'next' as sem, next_sem as subj from test1
  union
  select id, Name1, 'prev' as sem, prev_sem as subj from test1
), rules as (
  select id, Name1, 'add' as change, subj
    from norm t
   where sem = 'next'
     and not exists (
           select 1
             from norm
            where id = t.id 
              and sem != t.sem
              and subj = t.subj
       )
  union all
  select id, Name1, 'remove' as change, subj
    from norm t
   where sem = 'current'
     and not exists (
           select 1
             from norm
            where id = t.id
              and sem != t.sem
              and subj = t.subj
       )
)
select t.id, t.Name1, 
       string_agg(
         case
           when r.change = 'add' then subj
           else null
         end,
         ';'
       ) as sub_to_add,
       string_agg(
         case 
           when r.change = 'remove' then subj
           else null
         end,
         ';'
       ) as sub_to_remove
  from (select distinct id, Name1 from test1) t
       left join rules r 
         on r.id = t.id
 group by t.id, t.Name1
 order by t.id
 ;

db<>fiddle here