I have some data which I'm trying to clean in order to run further analysis on. One of the columns in the table is called record_type, this can either be NEW or DEL. This means that initially a NEW record might be added but then a DEL record would come in later to say that particular record is now expired (NEW and DEL records would be matched on the record_id). However both the NEW and DEL record would stay in the data, it doesn't get deleted.
So what I had planned to do is to create two CTEs, one for DEL records only and one for NEW records only:
WITH deleted_rec AS(
SELECT *
FROM main_table
WHERE record_type = 'DEL'
)
, new_rec AS(
SELECT *
FROM main_table
WHERE record_type = 'NEW'
)
Then outer join on the record_id column in both the CTEs.
SELECT *
FROM new_rec
FULL OUTER JOIN deleted_rec ON deleted_rec.record_id = new_rec.record_id
The goal would have been for the output to only include records which haven't had a DEL record come in for that record_id so that way I can guarantee that all the type NEW records I have in my final table would not have had a DEL record come in for them at any point and they would therefore all be active. However, I had forgotten that FULL OUTER JOIN return everything rather than just what didn't match so is there a way to get around this to get my desired output?
CodePudding user response:
I would just use a single query with exists logic:
SELECT *
FROM main_table t1
WHERE record_type = 'NEW' AND
NOT EXISTS (SELECT 1 FROM main_table t2
WHERE t2.id = t1.id AND t2.record_type = 'DEL');
In plain English, the above query says to find all records which are NEW
which also do not have associated with the same id
another record having DEL
.