Home > Net >  MySQL : Optimizing simple query
MySQL : Optimizing simple query

Time:01-27

Needless to say i am not proficient at SQL. Now i have to run a query on a table that looks like this :

 id, tp_id, value_1, value_2, value_3, date

This table has 2 entries for each distinct tp_id, with different values. tp_id is a foreign key, which is indexed, in the following table :

 id, external_id

I'm trying to retrieve data as follows :

Get distinct tp_id where value_2 = 2, value_1 = 1 | 2, value_3 = 1, and date < now - 1 year. These conditions must hold true for BOTH entries with matching tp_id

I have tried the following query, but as i understand it the SUM function paired with the JOIN statement makes the query too slow :

SELECT t1.tp_id, t2.external_id
FROM table_1 t1
JOIN table_2 t2 ON t1.tp_id = t2.id
GROUP BY t1.tp_id
HAVING 
  SUM(
    t1.value_2 = 2 
    AND t1.value_1 IN (1, 2) 
    AND t1.value_3 = 1 
    AND t1.date <= DATE_SUB(NOW(), INTERVAL 1 YEAR)
  ) = 2;

Both tables have roughly 2.5M rows.

I'd like to optimize this query or learn a better way to do this, so any help would be welcome. Thanks in advance

EDIT: It appears running this query will be altogether unnecessary. I will therefore close the question, thanks for the answers

CodePudding user response:

If I got your requirement correctly, something like this might help.

SELECT tp_id
FROM (
    SELECT t1.tp_id,count(*) as count
    FROM table_1 t1
    WHERE
      t1.value_2 = 2 
      AND (t1.value_1 = 1 OR t1.value_1 = 2) 
      AND t1.value_3 = 1 
      AND t1.date <= DATE_SUB(NOW(), INTERVAL 1 YEAR)
    GROUP BY tp_id
) as res 
WHERE res.count = 2

Essentially, I did 3 performance update:

  1. the WHERE condition is applied before the GROUP BY, way more performant than the HAVING
  2. I've used a nested query, but you can also use HAVING COUNT(tp_id) = 2 depending on your MySQL version
  3. 2 boolean checks should be more performant than an IN clause
  • Related