I have two tables in my postgreSQL database, Completion and path_completion. Completion table has 12 columns and Path_Completion has 11 columns where all are same except one extra column in the completion table Both tables has common rows. I want to get those rows which are not present in Completion table but are there in the Path_completion
Completion table -
Path_Completion table-
I would like to have my result as follows-
Logic being id- 4, event - le155 is present is both tables and status for this in completion table is "\N"
I tried the following but this didn't work-
select *
from path_completion
where (unique_id,event,status) not in
( select unique_id,event,status from completion where status IN ('Completed'))
CodePudding user response:
If your tables have a large data (approximately - over 100000) then using not in
gets very bad performance. Recommended using join tables
instead of not in
. For example:
select * from path_completion pc
inner join completion cc on pc.unique_id = cc.unique_id
where cc.status <> 'Completed'
This gets only data which has an in same tables and status <> 'Completed'. If you need select data that are maybe not in completion
table, so use this query:
select * from path_completion pc
left join completion cc on pc.unique_id = cc.unique_id and cc.status <> 'Completed'
I recommended to you using on conditions for example status is null (or status = '\N') instead of status <> 'Completed' for getting the best performance and for always using index scan table mode.
CodePudding user response:
Sample Query for full testing:
CREATE TABLE test.completion (
id int4 NULL,
ds date NULL,
"event" varchar(100) NULL,
time_duration interval NULL,
event_type varchar(100) NULL,
status varchar(100) NULL,
ranking int4 NULL
);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(1, '2022-03-02', 'le100', '8 days'::interval, 'xyz', 'Completed', 1);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(2, '2022-03-18', 'le108', '5 days'::interval, 'pqr', 'Completed', 1);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(3, '2022-03-19', 'le140', '13 days'::interval, 'abc', 'Completed', 1);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(4, '2022-03-25', 'le155', '12 days'::interval, 'mno', '\N', 2);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(5, '2022-03-25', 'le160', '4 days'::interval, 'abc', '\N', 2);
CREATE TABLE test.path_completion (
id int4 NULL,
ds date NULL,
"event" varchar(100) NULL,
time_duration interval NULL,
event_type varchar(100) NULL,
status varchar(100) NULL
);
INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status)
VALUES(1, '2022-03-02', 'le100', '8 days'::interval, 'xyz', 'Path_complete');
INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status)
VALUES(2, '2022-03-18', 'le108', '5 days'::interval, 'pqr', 'Path_complete');
INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status)
VALUES(3, '2022-03-19', 'le140', '13 days'::interval, 'abc', 'Path_complete');
INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status)
VALUES(4, '2022-03-25', 'le155', '12 days'::interval, 'mno', 'Path_complete');
-- Sample Query:
select pc.* from path_completion pc
inner join completion cc on pc.id = cc.id and pc."event" = cc."event"
where cc.status <> 'Completed';
-- Result
id ds event time_duration event_type status
-------------------------------------------------------------------
4 2022-03-25 le155 12 days mno Path_complete