I have a query over two tables -- matchoverview
id, home_id, away_id, date, season, result
matchattributes
id, game_id, attribute_id, attribute_value
My query
select m.id from matchOverview m
join matchAttributes ma on ma.match_id=m.id and ma.attribute_id in (3,4,5,6)
group by m.id
having sum(case when ma.attribute_id in (3,4)
then ma.attribute_value end) > 3
or sum(case when ma.attribute_id in (5,6)
then ma.attribute_value end) > 3;
Which returns all match ids where the sum of attributes 3 and 4 or 5 and 6 is greater than 3.
This particular query returns 900k rows, unsurprisingly in phpmyadmin this query takes a deal of time, as I imagine it needs to format the results into a table, but it clocks the query at .0113 seconds.
Yet when I make this query over PHP it takes 15 seconds, if I alter the query to LIMIT to only 100 results, it runs almost instantly, leaving me with the belief the only possibility being the amount of data being transferred is what is slowing it.
But would it really take 15 seconds to transfer 1M 4 byte ints over the network?
Is the only solution to further limit the query so that it returns less results?
EDIT
Results of an EXPLAIN on my query
id select_type table type key key key_len ref rows Extra
1 SIMPLE m index PRIMARY PRIMARY 4 NULL 2790717 Using index
1 SIMPLE ma ref match,attribute match 4 opta_matches2.m.id 2 Using where
How I am timing my SQL query
$time_pre = microtime(true);
$quer = $db->query($sql);
$time_post = microtime(true);
$exec_time = $time_post - $time_pre;
Data from slow query log
# Thread_id: 15 Schema: opta_matches2 QC_hit: No
# Query_time: 15.594386 Lock_time: 0.000089 Rows_sent: 923962 Rows_examined: 15688514
# Rows_affected: 0 Bytes_sent: 10726615
I am ok with dealing with a 15 second query if it is because that is how long it takes the data to move over the network, but if the query or my table can be optimized that is the best solution
The row count is not the issue, the following query
select m.id from matchOverview m
join matchAttributes ma on ma.match_id=m.id and ma.attribute_id in (1,2,3,4)
group by m.id
having sum(case when ma.attribute_id in (3,4)
then ma.attribute_value end) > 8
and sum(case when ma.attribute_id in (1,2)
then ma.attribute_value end) = 0;
returns only 24 rows but also takes ~15 seconds
CodePudding user response:
phpMyAdmin doesn't give you all results, it also using limit to default 25 results.
If you change this limit by changing "Number of rows" select box or type the limit in query, It will take more time to run the query.
CodePudding user response:
I think if you rewrote the conditions, at a minimum you might find something out. For instance, I think this does the same as the second example (the 24 results one);
SELECT
m.id
, at.total_12
, at.total_34
FROM matchOverview AS m
JOIN (
SELECT
m.id
, SUM(IF (ma.attribute_id IN(1,2), ma.attribute_value, 0)) AS total_12
, SUM(IF (ma.attribute_id IN(3,4), ma.attribute_value, 0)) AS total_34
FROM matchAttributes AS ma
WHERE m.id = ma.match_id
AND ma.attribute_id IN(1,2,3,4)
GROUP BY m.id
) AS at
WHERE at.total_12 > 0
AND at.total_34 > 8
It's more verbose, but it could help triangulate where the bottleneck(s) come from more readily.
For instance, if (a working) version of the above is still slow, then run the inner query with the GROUP BY
intact. Still slow? Remove the GROUP BY
. Move the GROUP BY/SUM
into the outer query, what happens?
That kinda thing. I can't run it so I can't work out a more precise answer, which I would like to know.