I'm trying to get summary data from a table (dd=devicedata) that gets data every 10 minutes. I want to get the data for one day (24 hours), i.e. get the summary of 144 rows into one.
devicesData is indexed on createdAt and id:
Table Non_Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment Visible Expression
devicesData 0 PRIMARY 1 id A 165482 NULL NULL BTREE YES NULL
devicesData 1 devesData_1 1 devId A 21 NULL NULL BTREE YES NULL
devicesData 1 fk_devicesD 1 roomId A 16 NULL NULL BTREE YES NULL
devicesData 1 createdAt_ 1 createdAt A 156304 NULL NULL YES BTREE YES NULL
devicesData 1 dt_index 1 dt A 164176 NULL NULL BTREE YES NULL
devicesData 1 idx_roId_dt 1 roomId A 17 NULL NULL BTREE YES NULL
devicesData 1 idx_roId_dt 2 dt A 164876 NULL NULL BTREE YES NULL
SELECT
r.id AS `roomId`,
l.id AS `levelId`,
b.id AS `buildingId`,
s.id AS `siteId`,
date(CURDATE() - INTERVAL 1 DAY) AS `date`,
r.`name` AS `roomName`,
l.`name` AS `levelName`,
b.`name` AS `buildingName`,
s.`name` AS `siteName`,
r.`size` * r.`eui` * cdd.`value` / 3280 AS `referenceConsumption`,
SUM(dd.activeEnergy) AS `energy`,
SUM(dd.coolingEnergy) AS `coolingEnergy`,
SUM(dd.fanTime) AS `fanTime`,
SUM(dd.comprTime) AS `comprTime`,
SUM(dd.presence) AS `presenceTime`,
AVG(dd.temp1) AS `temp1`,
COUNT(dd.id) * 10 AS `conTime`
FROM rooms r
JOIN levels AS l ON l.id = r.levelId
JOIN buildings AS b ON b.id = l.buildingId
JOIN sites AS s ON s.id = b.siteId
LEFT JOIN Devices.coolingDegreeDays AS cdd
ON cdd.`day` = date(CURDATE() - INTERVAL 1 DAY)
LEFT JOIN devicesData dd
ON dd.roomId = r.id AND date(dd.dt) = date(CURDATE() - INTERVAL 1 DAY)
GROUP BY r.id;
For a minimal data set of 32 rooms, i.e. a maximum 32 x 144 entries, the response time is .5 seconds. Checking on a larger data set of 183 rooms x 144 rows = 26350 rows, the response time is 4 minutes. That is acceptable but still seems ridiculously slow for this small data set.
Note that a number of rows in devicesData are not populated (no data for that timestamp), so there is no data available for that id and time.
EXPLAIN result:
Field Type Null Key Default Extra
id int NO PRI NULL auto_increment
devId varchar NO MUL NULL
roomId int NO MUL NULL
dt datetim NO MUL CURRENT_TIMESTAMP DEFAULT_GENERATED
temp1 float YES NULL
temp2 float YES NULL
hum1 float YES NULL
relay1 tinyint YES NULL
relay2 tinyint YES NULL
presencefloat YES NULL
fanTime int YES NULL
comprTimint YES NULL
activeEnfloat YES NULL
reactEnefloat YES NULL
airflowAfloat YES NULL
coolingEfloat YES NULL
createdAint uns YES MUL NULL
I suspect that the culprit is this line:
LEFT JOIN devicesData dd
ON dd.roomId = r.id AND date(dd.dt) = date(CURDATE() - INTERVAL 1 DAY)
since it needs to evaluate date(dd.dt)
for every line and will not benefit from the index.
Experimenting by replacing with a query on indexed field dd.dt (deviceData dateTime):
LEFT JOIN devicesData dd
ON dd.roomId = r.id
AND dd.createdAt > unix_timestamp(CURDATE() - INTERVAL 1 DAY)
AND dd.createdAt < unix_timestamp(CURDATE())
the performance is poorer--a query which takes .5 seconds for 32 rooms now takes 2.5 seconds.
Is there an issue with the index or what is the reason for this relatively poor performance?
CodePudding user response:
Please include the EXPLAIN output for your query. It is possible that the stats have not updated and the index is not being used as you expect. ANALYZE TABLE devicesData;
will update the stats.
As you have identified, the likely culprit is -
LEFT JOIN devicesData dd ON dd.roomId = r.id and date(dd.dt) = date(CURDATE()- INTERVAL 1 DAY)
The use of the DATE() function on both sides of the join condition means the engine is unable to use any index covering dt column. Currently, there is no index covering the dt column. Changing the join and adding the index should improve things -
LEFT JOIN devicesData dd
ON dd.roomId = r.id
AND dd.dt BETWEEN (CURDATE()- INTERVAL 1 DAY) AND (CURDATE()- INTERVAL 1 SECOND)
Adding a composite index on (roomID, dt) should improve performance -
ALTER TABLE `devicesData` ADD INDEX `idx_roomId_dt` (`roomID`, `dt`);
CodePudding user response:
(nnichols gives a good suggestion for making dd.dt
sargeable)
Beware of SUM()
with JOIN
. All the JOINing
(including NULLs for empty LEFT JOINs
) is done first and written [logically] to a big temp table. Then the SUMs
are done. Check your output for having counts and sums that are too big.
Before I rewrite the entire query, please check to see that this produces the correct SUMs:
SELECT r.id, SUM(dd...) as energy, ...
FROM rooms r
JOIN devicesData dd ON dd.roomId = r.id
WHERE dd.dt >= CURDATE() - INTERVAL 1 DAY
AND dd.dt < CURDATE()
GROUP BY r.id;
If that is a 1:1 mapping, then you don't need the SUMs?
If that looks OK, then
SELECT ...
FROM rooms r
LEFT JOIN ( the above query ) AS sums ON
JOIN ... (the rest of the stuff)
-- no GROUP BY
In any case, these indexes may be beneficial:
r: INDEX(levelId, id, name, size, eui)
cdd: INDEX(day, value)
dd: INDEX(roomId, dt)
Devices: INDEX(coolingDegreeDays)