Needless to say i am not proficient at SQL. Now i have to run a query on a table that looks like this :
id, tp_id, value_1, value_2, value_3, date
This table has 2 entries for each distinct tp_id, with different values. tp_id is a foreign key, which is indexed, in the following table :
id, external_id
I'm trying to retrieve data as follows :
Get distinct tp_id where value_2 = 2, value_1 = 1 | 2, value_3 = 1, and date < now - 1 year. These conditions must hold true for BOTH entries with matching tp_id
I have tried the following query, but as i understand it the SUM function paired with the JOIN statement makes the query too slow :
SELECT t1.tp_id, t2.external_id
FROM table_1 t1
JOIN table_2 t2 ON t1.tp_id = t2.id
GROUP BY t1.tp_id
HAVING
SUM(
t1.value_2 = 2
AND t1.value_1 IN (1, 2)
AND t1.value_3 = 1
AND t1.date <= DATE_SUB(NOW(), INTERVAL 1 YEAR)
) = 2;
Both tables have roughly 2.5M rows.
I'd like to optimize this query or learn a better way to do this, so any help would be welcome. Thanks in advance
EDIT: It appears running this query will be altogether unnecessary. I will therefore close the question, thanks for the answers
CodePudding user response:
If I got your requirement correctly, something like this might help.
SELECT tp_id
FROM (
SELECT t1.tp_id,count(*) as count
FROM table_1 t1
WHERE
t1.value_2 = 2
AND (t1.value_1 = 1 OR t1.value_1 = 2)
AND t1.value_3 = 1
AND t1.date <= DATE_SUB(NOW(), INTERVAL 1 YEAR)
GROUP BY tp_id
) as res
WHERE res.count = 2
Essentially, I did 3 performance update:
- the WHERE condition is applied before the GROUP BY, way more performant than the HAVING
- I've used a nested query, but you can also use HAVING COUNT(tp_id) = 2 depending on your MySQL version
- 2 boolean checks should be more performant than an IN clause