How to extract data rows from a table not present in the second table-CodePudding

I have two tables in my postgreSQL database, Completion and path_completion. Completion table has 12 columns and Path_Completion has 11 columns where all are same except one extra column in the completion table Both tables has common rows. I want to get those rows which are not present in Completion table but are there in the Path_completion

Completion table -

Path_Completion table-

I would like to have my result as follows-

Logic being id- 4, event - le155 is present is both tables and status for this in completion table is "\N"

I tried the following but this didn't work-

select * 
from path_completion
where (unique_id,event,status) not in
( select unique_id,event,status from completion where status IN ('Completed'))

CodePudding user response：

If your tables have a large data (approximately - over 100000) then using not in gets very bad performance. Recommended using join tables instead of not in. For example:

select * from path_completion pc 
inner join completion cc on pc.unique_id = cc.unique_id 
where cc.status <> 'Completed'

This gets only data which has an in same tables and status <> 'Completed'. If you need select data that are maybe not in completion table, so use this query:

select * from path_completion pc 
left join completion cc on pc.unique_id = cc.unique_id and cc.status <> 'Completed'

I recommended to you using on conditions for example status is null (or status = '\N') instead of status <> 'Completed' for getting the best performance and for always using index scan table mode.

CodePudding user response：

Sample Query for full testing:

CREATE TABLE test.completion (
    id int4 NULL,
    ds date NULL,
    "event" varchar(100) NULL,
    time_duration interval NULL,
    event_type varchar(100) NULL,
    status varchar(100) NULL,
    ranking int4 NULL
);

INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking) 
VALUES(1, '2022-03-02', 'le100', '8 days'::interval, 'xyz', 'Completed', 1);

INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking) 
VALUES(2, '2022-03-18', 'le108', '5 days'::interval, 'pqr', 'Completed', 1);

INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking) 
VALUES(3, '2022-03-19', 'le140', '13 days'::interval, 'abc', 'Completed', 1);

INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking) 
VALUES(4, '2022-03-25', 'le155', '12 days'::interval, 'mno', '\N', 2);

INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking) 
VALUES(5, '2022-03-25', 'le160', '4 days'::interval, 'abc', '\N', 2);

CREATE TABLE test.path_completion (
    id int4 NULL,
    ds date NULL,
    "event" varchar(100) NULL,
    time_duration interval NULL,
    event_type varchar(100) NULL,
    status varchar(100) NULL
);

INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status) 
VALUES(1, '2022-03-02', 'le100', '8 days'::interval, 'xyz', 'Path_complete');

INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status) 
VALUES(2, '2022-03-18', 'le108', '5 days'::interval, 'pqr', 'Path_complete');

INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status) 
VALUES(3, '2022-03-19', 'le140', '13 days'::interval, 'abc', 'Path_complete');

INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status) 
VALUES(4, '2022-03-25', 'le155', '12 days'::interval, 'mno', 'Path_complete');


-- Sample Query: 
select pc.* from path_completion pc 
inner join completion cc on pc.id = cc.id and pc."event" = cc."event"  
where cc.status <> 'Completed';


-- Result 
id  ds          event   time_duration   event_type  status
-------------------------------------------------------------------
4   2022-03-25  le155   12 days         mno         Path_complete