I'm trying to fetch a row from a table called export
with random weights. It should then fetch one row from another table export_chunk
which references the first row. This is the query:
SELECT * FROM export_chunk
WHERE export_id=(
SELECT id FROM export
WHERE schedulable=1
ORDER BY -LOG(1 - RAND())/export.weight LIMIT 1)
AND status='PENDING'
LIMIT 2;
The export
table can have 1000 rows while the export_chunk table can have millions of rows.
The query is very fast when the inner query returns a row. However, if there are no rows with schedulable=1
, the outer query performs a full table scan on export_chunk
. Why does this happen and is there any way to prevent it?
EDIT: Trying COALESCE()
Akina in the comments suggested using COALESCE, ie.:
SELECT * FROM export_chunk
WHERE export_id=COALESCE(
SELECT id FROM export
WHERE schedulable=1
ORDER BY -LOG(1 - RAND())/export.weight LIMIT 1)
,-1)
AND status='PENDING'
LIMIT 2;
This should work. When I run:
SELECT COALESCE((SELECT id FROM export WHERE schedulable=1 ORDER BY -LOG(1-RAND())/export.weight LIMIT 1), -1) FROM export;
It does return -1 for each row which Akina predicted. And if I manually search for -1 instead of the inner query it returns no rows very quickly. However, when I try to use COALESCE on the inner query it is still really slow. I do not understand why.
CodePudding user response:
Test this:
SELECT export_chunk.*
FROM export_chunk
JOIN ( SELECT id
FROM export
WHERE schedulable=1
ORDER BY -LOG(1 - RAND())/export.weight
LIMIT 1 ) AS random_row ON export_chunk.export_id=random_row.id
WHERE export_chunk.status='PENDING'
LIMIT 2;
Does this matches needed logic? especially when no matching rows in the subquery - do you need none output rows (like now) or any 2 rows in this case?
PS. LIMIT without ORDER BY in outer query is strange.