Home > front end >  how to optimize query searching for similar ids referenced in different tables
how to optimize query searching for similar ids referenced in different tables

Time:07-24

in my query here https://www.db-fiddle.com/f/nfJzZoYC5gEXLu8hrw4JT2/1

SELECT id, COUNT(DISTINCT tc1.co_id), COUNT(DISTINCT tc2.co_id)
FROM test t
INNER JOIN test_corelation_1 tc1 ON tc1.test_id = t.id AND tc1.co_id IN (
    SELECT co_id 
    FROM test_corelation_1
    WHERE test_id IN (1, 2, 5)
    GROUP BY co_id
)
INNER JOIN test_corelation_2 tc2 ON tc2.test_id = t.id AND tc2.co_id IN (
    SELECT co_id 
    FROM test_corelation_2
    WHERE test_id IN (1, 2, 5)
    GROUP BY co_id
)
GROUP BY t.id
ORDER BY (COUNT(DISTINCT tc1.co_id)   COUNT(DISTINCT tc2.co_id)) ASC;

i am trying getting all the ids from table test that shares similar ids corelated to the ids 1, 2, 3 then sorting it by the least similar by counting it which results in this

id COUNT(DISTINCT tc1.co_id) COUNT(DISTINCT tc2.co_id)
3 1 3
2 3 7
1 5 6

but it gets very very slow the more ids i am checking for its similarities and i do not know how to optimize it further from this and i thought of using CTE but it had same results in the optimizer explain

CodePudding user response:

You seem to have written the query in a very awkward way. I would write it this way, using direct joins to subqueries which find the counts for the two test correlation tables:

SELECT t.id, COALESCE(tc1.cnt1, 0) AS cnt1, COALESCE(tc2.cnt2, 0) AS cnt2
FROM test t
LEFT JOIN
(
    SELECT co_id, COUNT(*) AS cnt1
    FROM test_corelation_1
    WHERE test_id IN (1, 2, 5)
    GROUP BY co_id
) tc1
    ON tc1.test_id = t.id
LEFT JOIN
(
    SELECT co_id, COUNT(*) AS cnt2
    FROM test_corelation_2
    WHERE test_id IN (1, 2, 5)
    GROUP BY co_id
) tc2
    ON tc2.test_id = t.id
ORDER BY cnt1   cnt2;

Count queries are notoriously difficult to optimize, but the subqueries above have a WHERE clause, so the following indices might help:

CREATE INDEX tc_idx_1 ON test_corelation_1 (test_id, co_id);
CREATE INDEX tc_idx_1 ON test_corelation_2 (test_id, co_id);

CodePudding user response:

The main means we have to speed up queries are indexes. But of course we should ensure that our queries are as straight-forward as possible. In your case you don't need the test table at all. Just get your counts from the two other tables and join them. As you are only interested in test IDs that exist in both tables, this is a mere inner join.

select test_id, c1.count_1, c2.count_2
from
(
  select test_id, count(*) as count_1
  from test_corelation_1
  where co_id in (select co_id from test_corelation_1 where test_id in (1, 2, 5))
  group by test_id
) c1
join
(
  select test_id, count(*) as count_2
  from test_corelation_2
  where co_id in (select co_id from test_corelation_2 where test_id in (1, 2, 5))
  group by test_id
) c2 using (test_id)
order by c1.count_1   c2.count_2, test_id;

I recommend these indexes for the query:

create index idx1 on test_corelation_1 (co_id, test_id);
create index idx2 on test_corelation_2 (co_id, test_id);

(In case the DBMS wants to work with indexes on (co_id, test_id), too, it already has these indexes, as these are the tables' primary keys.)

  • Related