Write a query that checks for traffic inconsistencies. An inconsistency is when a visit (url, uid, dt, src, rev) involves a source page src that was never visited by user uid. Return the key of the visit (url, uid, dt).
This is the table I am working with:
The primary key is url, uid, dt for the visit table.
Visit:
url | dt | uid | src | rev |
---|---|---|---|---|
A02 | 05/18/2003 | A | A05 | 20 |
A03 | 05/19/2003 | B | A01 | 15 |
A01 | 5/20/2004 | B | A02 | 10 |
I would want it to return me these results:
url | dt | uid | src | rev |
---|---|---|---|---|
A02 | 05/18/2003 | A | A05 | 20 |
A01 | 05/20/2004 | B | A02 | 10 |
This is because A05 which is the source of A02 was never visited by uid. A03 visit would not be returned because the source was visited by B before.
How would I write a query that returns this?
First, I did try connecting the two Visit tables together in a self Join on the src and url:
Select distinct V1.url, V1.uid
FROM Visit V1, Visit V2
WHERE V1.src = V2.url
But I am unsure where to go from here. I know that group by maybe be needed.
CodePudding user response:
NOT EXISTS
helps find these records.
SELECT *
FROM Visit a
WHERE NOT EXISTS (
SELECT 1
FROM Visit b
WHERE b.url = a.src
AND b.uid = a.uid
)