I have to compare values in 3 columns of the same table for every group based on id. It's better to illustrate what I am trying to achieve :
create table test1 (id nvarchar(8), Name1 nvarchar(10), current_sem nvarchar(10), next_sem nvarchar(10), prev_sem nvarchar(10))
INSERT INTO test1 VALUES ('R001', 'Michael', 'Physics', 'Maths', 'Physics')
INSERT INTO test1 VALUES ('R001', 'Michael', 'Physics', 'Maths', 'Chemistry')
INSERT INTO test1 VALUES ('R003', 'Tim', 'Physics', 'Maths', 'Maths')
INSERT INTO test1 VALUES ('R002', 'John', 'Physics', 'Maths', 'Commerce')
INSERT INTO test1 VALUES ('R003', 'Tim', 'Maths', 'Maths', 'Physics')
INSERT INTO test1 VALUES ('R002', 'John', 'Maths', 'Commerce', 'Physics')
INSERT INTO test1 VALUES ('R002', 'John', 'Commerce', 'Physics', 'Maths')
INSERT INTO test1 VALUES ('R003', 'Tim', 'History', 'Civics', 'HomeEc')
INSERT INTO test1 VALUES ('R003', 'Tim', 'Drama', 'Chemistry', 'HomeEc')
The last 3 columns in the table are : current_sem
, next_sem
, and prev_sem
.
For each id I want to find the values of next_sem which is or are not present in either current_sem or prev_sum and display under a pseudo column named "Sub_to_add".
And for the same id there is another pseudo_column named "Sub_to_remove" which contains value of current_sem which is not present in either next_sem or prev_sem. So if we see the above data id R001 (Michael) will have Maths as sub_to_add because Maths is not present in current_sem or prev_sem for R001.
R002 should ideally not be displayed because all the next_sem values are present in either current_sem or prev_sem. Similarly nothing to sub_to_remove.
R003 will have Civics;Chemistry under sub_to_add and History:Drama under sub_to_remove.
I am not sure how to show the output in code, so I am adding a screenshot.
I cannot use (next_sem <> current_sem OR next_sem <> prev_sem)
in the WHERE
clause because a value can be present in another row for same column - example of R002.
id | Name1 | Sub_to_add | Sub_to_remove
--------------------------------------------------------------
R001 | Michael | Maths |
R003 | Tim | Civics:Chemistry | History;Drama
So, I am seeking help here. This query will be used on SQL Server 2019 v15.
CodePudding user response:
Thank you for this reminder of how grateful I am to be working with PostgreSQL instead of SQL Server nowadays. This was downright painful mostly because of your data model but also because of SQL Server's limitations.
I normalized your data, and that simplified calculating the necessary changes.
with norm as (
select id, Name1, 'current' as sem, current_sem as subj from test1
union
select id, Name1, 'next' as sem, next_sem as subj from test1
union
select id, Name1, 'prev' as sem, prev_sem as subj from test1
), rules as (
select id, Name1, 'add' as change, subj
from norm t
where sem = 'next'
and not exists (
select 1
from norm
where id = t.id
and sem != t.sem
and subj = t.subj
)
union all
select id, Name1, 'remove' as change, subj
from norm t
where sem = 'current'
and not exists (
select 1
from norm
where id = t.id
and sem != t.sem
and subj = t.subj
)
)
select t.id, t.Name1,
string_agg(
case
when r.change = 'add' then subj
else null
end,
';'
) as sub_to_add,
string_agg(
case
when r.change = 'remove' then subj
else null
end,
';'
) as sub_to_remove
from (select distinct id, Name1 from test1) t
left join rules r
on r.id = t.id
group by t.id, t.Name1
order by t.id
;
db<>fiddle here