SELECT results.idpatient AS nhs_number, TIMESTAMPDIFF(YEAR, STR_TO_DATE(results.dob,'%Y-%m-%d'), CURDATE()) AS age, most_overdue.days_overdue,
most_overdue.current_status, most_overdue.action, indications.associated_indications
FROM mytable AS results
INNER JOIN (
SELECT
idpatient,
MAX(days_overdue) as days_overdue,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), current_status)), 12) AS current_status,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), action)), 12) AS action
FROM
mytable
GROUP BY
idpatient
) AS most_overdue
ON results.idpatient = most_overdue.idpatient
INNER JOIN (
SELECT idpatient, action, GROUP_CONCAT(indication SEPARATOR ',') as associated_indications
FROM mytable
GROUP BY idpatient, action
) AS indications
ON results.idpatient = indications.idpatient AND most_overdue.action = indications.action
GROUP BY results.idpatient
LIMIT 0,100;
The above query takes ~4s to run on my big MySQL DB, and the issue seems to be the GROUP BYs.
See a simplified Fiddle here .
I read MySQL's page on optimizing GROUP BY, but this doesn't seem to cover its use in conjunction with GROUP_CONCAT.
The schema is as follows:
CREATE TABLE `mytable` (
`idpatient` varchar(32) NOT NULL,
`indication` varchar(255) NOT NULL,
`action` varchar(255) NOT NULL,
`current_status` varchar(255) NOT NULL,
`query_type` varchar(255) NOT NULL,
`last_date` varchar(255) NOT NULL,
`days_overdue` bigint(20) DEFAULT NULL,
`dob` varchar(255) NOT NULL,
PRIMARY KEY (`idpatient`,`indication`,`action`),
KEY `action_idx` (`action`),
KEY `indication_idx` (`indication`),
KEY `idpatient_action_idx` (`idpatient`,`action`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
EXPLAIN tells me the indication_idx and idpatient_action_idx are in use when running the slow query.
Any help towards getting this under 1s would be much appreciated, thanks!
CodePudding user response:
I guess, without any evidence to support my guess, that you want your resultset to show the hundred lowest idpatient
values.
Your query is made of two subqueries and a main query. Let's start with the first subquery.
SELECT
idpatient,
MAX(days_overdue) as days_overdue,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), current_status)), 12) AS current_status,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), action)), 12) AS action
FROM mytable
GROUP BY idpatient
ORDER BY idpatient
LIMIT 0, 100;
Notice that I included your LIMIT clause in this subquery.
This subquery will benefit from the following covering index.
ALTER TABLE mytable
ADD INDEX patient_days_status_activity
(idpatient, days_overdue, current_status, action);
The first two columns of the index support your GROUP BY
operation. If your table is smaller than something like 100k rows you can omit the last two columns from that index; they only serve to make the lookup of current_status
and action
a bit faster.
With the index in place examine the performance of that subquery; look at EXPLAIN
output. Convince yourself it's fast enough for your purposes.
Now for the second subquery. Your idpatient_action_idx
index helps with this query. You should add the ORDER BY / LIMIT clauses to it. You could also add indication
to that index to make it into a covering index.
ALTER TABLE mytable
DROP INDEX idpatient_action_idx,
ADD INDEX idpatient_action_idx (idpatient, action, indication);
Make the index change if you want to. Then examine the performance of that subquery as well. If both the subqueries have acceptable performance your main query probably will too.
CodePudding user response:
Does this query give the correct result (I assume it is still slow)?
SELECT results.idpatient AS nhs_number
, most_overdue.days_overdue
, most_overdue.current_status
, most_overdue.action
, indications.associated_indications
FROM (
SELECT distinct idpatient
FROM mytable
) AS results
INNER JOIN (
SELECT
idpatient,
MAX(days_overdue) as days_overdue,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), current_status)), 12) AS current_status,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), action)), 12) AS action
FROM
mytable
GROUP BY
idpatient
) AS most_overdue
ON results.idpatient = most_overdue.idpatient
INNER JOIN (
SELECT idpatient, action, GROUP_CONCAT(indication SEPARATOR ',') as associated_indications
FROM mytable
GROUP BY idpatient, action
) AS indications
ON results.idpatient = indications.idpatient AND most_overdue.action = indications.action;
This may be a cleaner way to derive the current_status/action associated with most days_overdue:
SELECT results.idpatient AS nhs_number
, (most_overdue.days_overdue)
, (results.current_status)
, results.action
, indications.associated_indications
FROM mytable AS results
JOIN (
SELECT
idpatient,
MAX(days_overdue) as days_overdue
FROM mytable
GROUP BY idpatient
) AS most_overdue
ON results.idpatient = most_overdue.idpatient
AND results.days_overdue = most_overdue.days_overdue
JOIN (
SELECT idpatient, action, GROUP_CONCAT(indication SEPARATOR ',') as associated_indications
FROM mytable
GROUP BY idpatient, action
) AS indications
ON results.idpatient = indications.idpatient
AND results.action = indications.action
-- GROUP BY results.idpatient, results.action
ORDER BY results.idpatient;
I can't help thinking that it should be possible to remove at least 1 access against mytable, but right now I don't see how