Home > other >  Mysql sum() giving timeout
Mysql sum() giving timeout

Time:06-09

I have this SQL

SELECT a.name, (select sum(amount) from trx t where t.name=a.name and time > '2022-06-08') Sum FROM trx a
group by a.name

But it doesn't work :( I unfortunately get timeout. Can you see what's wrong?

CodePudding user response:

I don't know why you're using a correlated subquery. Just do the sum in the main query using GROUP BY.

SELECT name, SUM(amount) AS amount
FROM trx
WHERE time > '2022-06-08'
GROUP BY name

The only difference is that this will omit any names that didn't have any transactions after that date, instead of showing them with a zero count. You can fix this by joining with the list of all names.

SELECT t1.name, IFNULL(t2.amount, 0) as amount
FROM (SELECT distinct name FROM trx) AS t1
LEFT JOIN (
    SELECT name, SUM(amount) AS amount
    FROM trx
    WHERE time > '2022-06-08'
    GROUP BY name
) AS t2 ON t1.name = t2.name

CodePudding user response:

Your query may be ambiguous. MySQL has a mode that I call the "cheat mode" where it allows you to write invalid aggregation queries. In older versions this was even the default mode and you had to explicitly SET sql_mode = 'ONLY_FULL_GROUP_BY' to get out of that mode.

This query:

SELECT name, amount FROM trx GROUP BY name;

is invalid. I group by name, but want to show the amount. As there can be many amounts per name, this is not allowed. MySQL will raise an error when in ONLY_FULL_GROUP_BY mode. When in cheat mode, however, MySQL will muddle through by silently applying ANY_VALUE:

SELECT name, ANY_VALUE(amount) FROM trx GROUP BY name;

Now, where my query above has amount, yours has a correlated subquery on the name column. As you group by name, the DBMS can aggregate the rows to have one name per result row and then use this name to get the value from the subquery.

But if you are in cheat mode, the DBMS may also guess that you want to run the subquery on every single row and you want it to silently apply ANY_VALUE when aggregating. That would take much longer, as the DBMS would run the subquery again and again. Without an index, it would have to read the whole table as often as there are rows in the table. The result is the same, but runtime extremly long. This may explain the timeout.

When working with MySQL, make sure you always

SET sql_mode = 'ONLY_FULL_GROUP_BY';

Anyway, your query can be rewritten as

SELECT
  trx.name,
  SUM(CASE WHEN trx.time > DATE '2022-06-08' THEN trx.amount END) AS total
FROM trx
GROUP BY trx.name;

This runs through the whole table. You may be able to help the DBMS aggregating by name by providing an index on that column. Add the time and amount to get a covering index, if you want this even faster.

CREATE INDEX idx ON trx (name, time, amount);
  • Related