Home > Software engineering >  Why is query by id with literal ids is much faster than query by subquery?
Why is query by id with literal ids is much faster than query by subquery?

Time:02-04

I build a query that sorts through more than 200k records and returns 28 records in 7s. It is the following:

select idcustomer
from sale s 
inner join customer c on c.idCustomer = s.fkCustomer
WHERE s.dateSale between '1700-01-01' and '2022-01-26'
    and c.fkCompany = 'b92c5957-9275-4fa5-9970-1a41eb524328'

This query is using the following index:

create index customer_fkCompanyx on dactai.customer(fkCompany, idcustomer);

I wanted to also query the columns c.firstname and c.lastname but when I add them to the above query the query takes 40s to return. I read that it might be because of RID and the database having to query the actual table instead of the index to get the additional columns. I then tried adding those two columns to the index creating a "cover index" but the columns are too large and extrapolate the max size of the index.

I then tried a different approach, I tried adding this query as a subquery like so:

select firstname from customer where idcustomer in (
    select idcustomer
    from sale s 
    inner join customer c on c.idCustomer = s.fkCustomer
    WHERE s.dateSale between '1700-01-01' and '2022-01-26'
        and c.fkCompany = 'b92c5957-9275-4fa5-9970-1a41eb524328'
)

This approach takes 20 seconds, so 13s more than the subquery on its own. The curious thing and the reason for this post is that the outer query with all 28 literal ids returns instantly.

Why would the outer query that takes 0s plus the sub query that takes 7s result in a query that takes 20s when the outer query with the same 28 ids but literal, takes 0 seconds? Also, is there a way to add the first and last names to the original query without it taking 40s?

CodePudding user response:

c:  INDEX(fkCompany, idCustomer)
s:  INDEX(dateSale, fkCustomer)
s:  INDEX(fkCustomer, dateSale)

Also, try

select  c.firstname
    from  customer AS c
    JOIN  sale AS s  ON c.idCustomer = s.fkCustomer
    WHERE  s.dateSale between '1700-01-01' AND '2022-01-26'
      and  c.fkCompany = 'b92c5957-9275-4fa5-9970-1a41eb524328' 

And change my first index suggestion to

c:  INDEX(fkCompany, idCustomer,  firstname, lastname)

For further discussion, please provide SHOW CREATE TABLE for each table.

  • Related