I am trying to optimize this query:
SELECT eq.*,
reg_last_dt.dt as reg_last_date
FROM Equipment eq
INNER JOIN (select max( dt ) as dt, id_eq_equipment from consum group by id_eq_equipment) as reg_last_dt
ON reg_last_dt.id_eq_equipment = eq.id_eq
Explain shows me this:
Hash Join (cost=839806.69..839833.33 rows=23 width=1461)
Hash Cond: (eq.id_eq = consum.id_eq_equipment)
-> Seq Scan on equipment eq (cost=0.00..26.29 rows=129 width=1453)
-> Hash (cost=839806.40..839806.40 rows=23 width=10)
-> Finalize GroupAggregate (cost=839805.60..839806.17 rows=23 width=10)
Group Key: consum.id_eq_equipment
-> Sort (cost=839805.60..839805.71 rows=46 width=10)
Sort Key: consum.id_eq_equipment
-> Gather (cost=839799.50..839804.33 rows=46 width=10)
Workers Planned: 2
-> Partial HashAggregate (cost=838799.50..838799.73 rows=23 width=10)
Group Key: consum.id_eq_equipment
-> Parallel Seq Scan on consum (cost=0.00..755192.33 rows=16721433 width=10)
This looks not very optimal. Is there anything I could do to make it better?
CodePudding user response:
The row estimates in the query plan (only rows=129
for Equipment
, and only rows=23
for aggregated consum
) indicate that this query using a LATERAL
subquery instead should perform much faster:
SELECT eq.*, r.reg_last_date
FROM Equipment eq
CROSS JOIN LATERAL (
SELECT max(dt) AS reg_last_date
FROM consum c
WHERE c.id_eq_equipment = eq.id_eq
) r;
Be sure to have a multicolumn index on consum(id_eq_equipment, dt)
!
Related:
Maybe your really want a LEFT JOIN
to return all rows from Equipment
? See:
CodePudding user response:
If the estimates in the plan are correct, it would almost certainly be faster to do it with a subselect, this way:
SELECT
eq.*,
(select max( dt ) from consum where consum.id_eq_equipment = eq.id_eq) as reg_last_date
FROM Equipment eq
Note this will return NULL for reg_last_date where there is no corresponding record in consum, so you might want to filter those out if you don't want to see them.
You would need an index on (id_eq_equipment, dt)
to make it fast