Hi stackoverflow I have here a sql statement that is pretty slow performing which I think is due to the subquery seen in the sql below. My question is simply given that this subquery and the fact that it must set 'exists' to 0 or 1 given the logic can this be improved.
SELECT
p.id,
p.name,
(
SELECT
COUNT(*) > 0
FROM
product_log AS pl
WHERE
pl.product_id = p.id
AND
pl.state_name in ("Creation", "Auction", ...)
) AS 'has_log'
from product as p;
Table creations for log and product
CREATE TABLE `product_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`product_id` int(11) DEFAULT NULL,
`state_name` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
`create_datetime` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `product_id` (`product_id`),
KEY `state_name` (`state_name`,`create_datetime`)
(`id`,`product_id`,`state_name`,`create_datetime`)
) ENGINE=InnoDB AUTO_INCREMENT=8132540 DEFAULT CHARSET=utf8;
Running explain on the query, is a bit different since much of the details where omitted compared to the actual query that is run but here is the output
|id |select_type |table|partitions|type |possible_keys |key |key_len|ref |rows |filtered|Extra |
|---|------------------|-----|----------|------|-----------------------------------------------------------------|-------------------|-------|-----------------------------|------|--------|--------------------------------------------|
|1 |PRIMARY |p | |eq_ref|PRIMARY,guarantee_list_index |PRIMARY |4 |ppd.product_id |1 |100 | |
|2 |DEPENDENT SUBQUERY|pl | |ref |product_id,state_name |product_id |5 |p.id |3 |51.5 |Using where |
CodePudding user response:
You don't have to count all matches. Instead just check whether at least one match exists with EXISTS
or IN
.
SELECT
p.id,
p.name,
EXISTS
(
SELECT null
FROM product_log AS pl
WHERE pl.product_id = p.id
AND pl.state_name in ('Creation', 'Auction', ...)
) AS has_log
FROM product AS p;
or
SELECT
p.id,
p.name,
p.id IN
(
SELECT pl.product_id
FROM product_log AS pl
WHERE pl.state_name in ('Creation', 'Auction', ...)
) AS has_log
FROM product AS p;
Both queries do the same thing, so the DBMS can can up with the same execution plan for both. But the queries show there are two ways the DBMS may approach the task:
- Loop through all products and look up each product's logs.
- Find all products with a requested log and use this list for the lookup.
These approaches require different indexes. If I want to lookup the logs for a product, I want:
CREATE INDEX idx1 ON product_log (product_id, state_name);
If I want to build the lookup list first, I want:
CREATE INDEX idx2 ON product_log (state_name, product_id);
I suggest you create both indexes. Then see the explain plan for the query, see which one is used and drop the other.
CodePudding user response:
something like?
select p.id, p.name, CASE WHEN pl.id IS NULL THEN 0 ELSE 1 END AS has_log
from product as p
left join product_log AS pl on pl.product_id = p.id AND pl.state_name in ("Creation", "Auction", ...)
eliminating the sub select executed each row, and avoid using count() having to go over all records if you only need exists