I got this query from a fellow programmer, but it executes extremely slowly. it takes about 4 seconds to run.
Can this query be optimized to give the same result, but perform better?
SELECT checkedcrates,
pv.name as powervision,
cg.name as cgrp,
cs.name as csz,
c.name as cname
FROM public.inspectionresultsstatistic e,
crates c,
powervisions pv,
lines l,
quality q,
cratesgroupscrates cgc,
cratesgroups cg,
cratessizes cs
where
c.id = e.crateid
and l.id = e.lineid
and pv.id = l.powervisionid
and q.id = e.qualityid
and c.id = cgc.crateid
and cs.id = cgc.cratesizeid
and cg.id = cgc.crategroupid
and qualityid = 0
and pv.name in ('PV101')
and c.name in ('24603','104','136','154','186','106','156','216','246','206')
and cg.name in ('Black','Blue','DLL','Green')
and cs.name in ('30x40','60x40')
and to_timestamp(e.startts) >= '2021-10-18T17:45:22Z'
and to_timestamp(e.stopts-1) <= '2021-10-18T19:45:22Z'
group by
powervision,
cgrp,
csz,
cname,
checkedcrates,
startts
EDIT: actually just noticed that it is the inner select being slow... updated the query above by removing the outer query
EDIT2: maybe I should add some indexes? I have a index for every table where it is connected to the other (so all the ID columns) and have for the inspectionresultsstatistic a index on id qualityid startts stopts
EDIT3: as per request I try to give more informations about my tables and the data.
I am using PostgreSql 12,
the table structures are as follows:
http://sqlfiddle.com/#!17/bb5a6/1
all tables are rather small with less than 50 entries, except for inspectionresultsstatistics that contains about 12.000.000 rows.
CodePudding user response:
My big tell with this is that your joins are not written with more updated syntax. When you use commas between tables in your FROM clause, it creates a 'Cartesian Product' of both tables. This is much more resource intensive than using 'INNER JOIN' followed by an 'ON' clause. The ON clause should be used instead of a constraint in the WHERE clause.
Somthing like,
SELECT checkedcrates,
pv.name as powervision,
cg.name as cgrp,
cs.name as csz,
c.name as cname
FROM public.inspectionresultsstatistic e
INNER JOIN crates c
ON c.id = e.crateid
INNER JOIN lines l
ON l.id = e.lineid
INNER JOIN powervisions pv
ON pv.id = l.powervisionid
INNER JOIN quality q
ON q.id = e.qualityid
INNER JOIN cratesgroupscrates cgc
ON c.id = cgc.crateid
INNER JOIN cratesgroups cg
ON cg.id = cgc.crategroupid
INNER JOIN cratessizes cs
ON cs.id = cgc.cratesizeid
where qualityid = 0
and pv.name in ('PV101')
and c.name in ('24603','104','136','154','186','106','156','216','246','206')
and cg.name in ('Black','Blue','DLL','Green')
and cs.name in ('30x40','60x40')
and to_timestamp(e.startts) >= '2021-10-18T17:45:22Z'
and to_timestamp(e.stopts-1) <= '2021-10-18T19:45:22Z'
group by
powervision,
cgrp,
csz,
cname,
checkedcrates,
startts
CodePudding user response:
Your fiddle doesn't include the indexes. You should use db-fiddle.com rather than sqlfiddle as the latter is essentially broken, having nothing newer than 9.6 to offer.
The time range being queried is pretty small, so presumably is very selective.
You would be better querying that as a range:
and tstzrange(to_timestamp(e.startts),to_timestamp(e.stopts-1),'[]') <@
tstzrange('2021-10-18T17:45:22Z','2021-10-18T19:45:22Z','[]')
Which would benefit from the expression index
on inspectionresultsstatistic using gist (tstzrange(to_timestamp(startts),to_timestamp(stopts-1),'[]'))
But really, probably better off to store your timestamps as timestamptz rather than int8, or maybe even store them as tstzrange directly.