Home > Enterprise >  Query using HAVING taking too long
Query using HAVING taking too long

Time:10-03

I have a query over two tables -- matchoverview

id, home_id, away_id, date, season, result

matchattributes

id, game_id, attribute_id, attribute_value

My query

select m.id from matchOverview m
  join matchAttributes ma on ma.match_id=m.id and ma.attribute_id in (3,4,5,6)
  group by m.id
     having sum(case when ma.attribute_id in (3,4)
     then ma.attribute_value end) > 3
     or sum(case when ma.attribute_id in (5,6)
     then ma.attribute_value end) > 3;

Which returns all match ids where the sum of attributes 3 and 4 or 5 and 6 is greater than 3.

This particular query returns 900k rows, unsurprisingly in phpmyadmin this query takes a deal of time, as I imagine it needs to format the results into a table, but it clocks the query at .0113 seconds.

Yet when I make this query over PHP it takes 15 seconds, if I alter the query to LIMIT to only 100 results, it runs almost instantly, leaving me with the belief the only possibility being the amount of data being transferred is what is slowing it.

But would it really take 15 seconds to transfer 1M 4 byte ints over the network?

Is the only solution to further limit the query so that it returns less results?

EDIT

Results of an EXPLAIN on my query

id  select_type  table  type   key             key     key_len ref                 rows    Extra
1   SIMPLE       m      index  PRIMARY         PRIMARY 4       NULL                2790717 Using index
1   SIMPLE       ma     ref    match,attribute match   4       opta_matches2.m.id  2       Using where

How I am timing my SQL query

$time_pre = microtime(true);
$quer = $db->query($sql);
$time_post = microtime(true);
$exec_time = $time_post - $time_pre;

Data from slow query log

# Thread_id: 15  Schema: opta_matches2  QC_hit: No
# Query_time: 15.594386  Lock_time: 0.000089  Rows_sent: 923962  Rows_examined: 15688514
# Rows_affected: 0  Bytes_sent: 10726615

I am ok with dealing with a 15 second query if it is because that is how long it takes the data to move over the network, but if the query or my table can be optimized that is the best solution

The row count is not the issue, the following query

select m.id from matchOverview m
  join matchAttributes ma on ma.match_id=m.id and ma.attribute_id in (1,2,3,4)
 group by m.id
  having sum(case when ma.attribute_id in (3,4)
   then ma.attribute_value end) > 8
  and sum(case when ma.attribute_id in (1,2)
   then ma.attribute_value end) = 0;

returns only 24 rows but also takes ~15 seconds

CodePudding user response:

phpMyAdmin doesn't give you all results, it also using limit to default 25 results.

If you change this limit by changing "Number of rows" select box or type the limit in query, It will take more time to run the query.

CodePudding user response:

I think if you rewrote the conditions, at a minimum you might find something out. For instance, I think this does the same as the second example (the 24 results one);

SELECT
   m.id
 , at.total_12
 , at.total_34
FROM matchOverview AS m
JOIN (
    SELECT
       m.id
     , SUM(IF (ma.attribute_id IN(1,2), ma.attribute_value, 0)) AS total_12
     , SUM(IF (ma.attribute_id IN(3,4), ma.attribute_value, 0)) AS total_34
    FROM matchAttributes AS ma
    WHERE m.id = ma.match_id
    AND ma.attribute_id IN(1,2,3,4)
    GROUP BY m.id
) AS at
WHERE at.total_12 > 0
AND at.total_34 > 8

It's more verbose, but it could help triangulate where the bottleneck(s) come from more readily.

For instance, if (a working) version of the above is still slow, then run the inner query with the GROUP BY intact. Still slow? Remove the GROUP BY. Move the GROUP BY/SUM into the outer query, what happens?

That kinda thing. I can't run it so I can't work out a more precise answer, which I would like to know.

  • Related