in my query here https://www.db-fiddle.com/f/nfJzZoYC5gEXLu8hrw4JT2/1
SELECT id, COUNT(DISTINCT tc1.co_id), COUNT(DISTINCT tc2.co_id)
FROM test t
INNER JOIN test_corelation_1 tc1 ON tc1.test_id = t.id AND tc1.co_id IN (
SELECT co_id
FROM test_corelation_1
WHERE test_id IN (1, 2, 5)
GROUP BY co_id
)
INNER JOIN test_corelation_2 tc2 ON tc2.test_id = t.id AND tc2.co_id IN (
SELECT co_id
FROM test_corelation_2
WHERE test_id IN (1, 2, 5)
GROUP BY co_id
)
GROUP BY t.id
ORDER BY (COUNT(DISTINCT tc1.co_id) COUNT(DISTINCT tc2.co_id)) ASC;
i am trying getting all the ids from table test
that shares similar ids corelated to the ids 1, 2, 3
then sorting it by the least similar by counting it which results in this
id | COUNT(DISTINCT tc1.co_id) | COUNT(DISTINCT tc2.co_id) |
---|---|---|
3 | 1 | 3 |
2 | 3 | 7 |
1 | 5 | 6 |
but it gets very very slow the more ids i am checking for its similarities and i do not know how to optimize it further from this and i thought of using CTE but it had same results in the optimizer explain
CodePudding user response:
You seem to have written the query in a very awkward way. I would write it this way, using direct joins to subqueries which find the counts for the two test correlation tables:
SELECT t.id, COALESCE(tc1.cnt1, 0) AS cnt1, COALESCE(tc2.cnt2, 0) AS cnt2
FROM test t
LEFT JOIN
(
SELECT co_id, COUNT(*) AS cnt1
FROM test_corelation_1
WHERE test_id IN (1, 2, 5)
GROUP BY co_id
) tc1
ON tc1.test_id = t.id
LEFT JOIN
(
SELECT co_id, COUNT(*) AS cnt2
FROM test_corelation_2
WHERE test_id IN (1, 2, 5)
GROUP BY co_id
) tc2
ON tc2.test_id = t.id
ORDER BY cnt1 cnt2;
Count queries are notoriously difficult to optimize, but the subqueries above have a WHERE
clause, so the following indices might help:
CREATE INDEX tc_idx_1 ON test_corelation_1 (test_id, co_id);
CREATE INDEX tc_idx_1 ON test_corelation_2 (test_id, co_id);
CodePudding user response:
The main means we have to speed up queries are indexes. But of course we should ensure that our queries are as straight-forward as possible. In your case you don't need the test table at all. Just get your counts from the two other tables and join them. As you are only interested in test IDs that exist in both tables, this is a mere inner join.
select test_id, c1.count_1, c2.count_2
from
(
select test_id, count(*) as count_1
from test_corelation_1
where co_id in (select co_id from test_corelation_1 where test_id in (1, 2, 5))
group by test_id
) c1
join
(
select test_id, count(*) as count_2
from test_corelation_2
where co_id in (select co_id from test_corelation_2 where test_id in (1, 2, 5))
group by test_id
) c2 using (test_id)
order by c1.count_1 c2.count_2, test_id;
I recommend these indexes for the query:
create index idx1 on test_corelation_1 (co_id, test_id);
create index idx2 on test_corelation_2 (co_id, test_id);
(In case the DBMS wants to work with indexes on (co_id, test_id), too, it already has these indexes, as these are the tables' primary keys.)