Optimizing sql query: check for all rows in table B if any rows in table C reference the same row in-CodePudding

I have 3 tables, A, B and C structured like this

CREATE TABLE a (
    id SERIAL NOT NULL PRIMARY KEY
);

CREATE TABLE b (
    id SERIAL NOT NULL PRIMARY KEY,
    a_id INT REFERENCES a(id) ON DELETE CASCADE
);

CREATE TABLE c (
    id SERIAL NOT NULL PRIMARY KEY,
    a_id INT REFERENCES a(id) ON DELETE CASCADE
);

Where the relationships are many-to-one. What i want is, for every row in table b, i want to check if any row in table c has a reference to the same row in table a. Now, I already have the query

SELECT
    b.id,
    true
FROM
    b
WHERE EXISTS (
    SELECT 1
    FROM c
    WHERE b.a_id = c.a_id
)
UNION
SELECT
    b.id,
    false
FROM
    b
WHERE NOT EXISTS (
    SELECT 1
    FROM c
    WHERE b.a_id = c.a_id
)
ORDER BY id

Though I am not certain, I think this is doing double work, and going through the table twice, and I am wondering how I could optimize it to only traverse the table once.

Is it possible with a simple query, or do I have to do anything complex?

CodePudding user response：

Simply move the EXISTS clause into your SELECT clause.

SELECT
  b.id,
  EXISTS (SELECT null FROM c WHERE c.a_id = b.a_id) AS c_exists
FROM b;

The same with an IN clause, which I prefer for being even a tad simpler:

SELECT
  id,
  a_id IN (SELECT c.a_id FROM c) AS c_exists
FROM b;

CodePudding user response：

This can be done with a subquery, a left join, and a case.

The subquery gets you a list of distinct c.a_id values.

             SELECT DISTINCT a_id FROM c;

Then do this

SELECT b.id, 
       CASE WHEN distinct_ids.a_id IS NULL THEN 'false'
            ELSE 'true' END has_c_row
  FROM b
  LEFT JOIN (
                SELECT DISTINCT a_id FROM c;
       ) distinct_ids ON b.a_id = distinct_ids.a_id

This shape of query is called an antijoin or IS NULL ... LEFT JOIN. It detects the rows in the first table that don't match rows in the second table.

The subquery gives us a view of the data in table c with at most one row per each distinct a_id value. Without the subquery, we might get duplicate rows in the result query.

This eliminates your WHERE EXISTS correlated subqueries; even though PostgreSQL's query planner is pretty smart, sometimes it does the slow thing with subqueries like that.

If it is still too slow for you, create these indexes on the a_id columns.

ALTER TABLE b ADD INDEX a_id (a_id);
ALTER TABLE c ADD INDEX a_id (a_id);

CodePudding user response：

i think i understand what you are after

this is how i would do it

SELECT b.id, ISNULL(res.result,0) as result
FROM b
LEFT JOIN (
    SELECT c.id, 1 as result
    FROM c
    INNER JOIN a on a.id = c.id
) res on b.id = res.id

i dont think you need to worry about distinct if they are all unique ids