I have a query with joins to rather large tables, but do not understand the slow performance of it.
Especially this part of the query plan seems weird to me (complete plan and query below):
-> Bitmap Heap Scan on order_line (cost=65.45..11521.37 rows=3228 width=20) (actual time=22.555..7764.120 rows=6250 loops=12)
Recheck Cond: (product_id = catalogue_product.id)
Heap Blocks: exact=71735
Buffers: shared hit=55299 read=16686
-> Bitmap Index Scan on order_line_product_id_e620902d (cost=0.00..64.65 rows=3228 width=0) (actual time=21.532..21.532 rows=6269 loops=12)
Index Cond: (product_id = catalogue_product.id)
Buffers: shared hit=143 read=107
Why does it need to recheck product_id = catalogue_product.id
which is the same as in index and then take so much time?
As far as i understand recheck is needed if a) only part of the condition can be covered by index or b) bitmap is too big and must be compressed - but then there should be a lossy=x entry, right?
Complete query:
SELECT ("order_order"."date_placed" AT TIME ZONE 'UTC')::date, "partner_partner"."odoo_id", "catalogue_product"."odoo_id", SUM("order_line"."quantity") AS "orders"
FROM "order_line"
INNER JOIN "order_order" ON ("order_line"."order_id" = "order_order"."id")
INNER JOIN "catalogue_product" ON ("order_line"."product_id" = "catalogue_product"."id")
INNER JOIN "partner_stockrecord" ON ("order_line"."stockrecord_id" = "partner_stockrecord"."id")
INNER JOIN "partner_partner" ON ("partner_stockrecord"."partner_id" = "partner_partner"."id")
WHERE (("order_order"."date_placed" AT TIME ZONE 'UTC')::date IN ('2022-11-22'::DATE)
AND "catalogue_product"."odoo_id" IN (6241, 6499, 6500, 49195, 44753, 44754, 53427, 6452, 44755, 44787, 6427, 6428)
AND "partner_partner"."odoo_id" IS NOT NULL AND NOT ("order_order"."status" IN ('Pending', 'PaymentDeclined', 'Canceled')))
GROUP BY ("order_order"."date_placed" AT TIME ZONE 'UTC')::date, "partner_partner"."odoo_id", "catalogue_product"."odoo_id", "order_line"."id"
ORDER BY "order_line"."id" ASC
Complete plan:
GroupAggregate (cost=141002.93..141003.41 rows=16 width=24) (actual time=93629.346..93629.369 rows=52 loops=1)
Group Key: order_line.id, ((timezone('UTC'::text, order_order.date_placed))::date), partner_partner.odoo_id, catalogue_product.odoo_id
Buffers: shared hit=56537 read=16693
-> Sort (cost=141002.93..141002.97 rows=16 width=20) (actual time=93629.331..93629.335 rows=52 loops=1)
Sort Key: order_line.id, partner_partner.odoo_id, catalogue_product.odoo_id
Sort Method: quicksort Memory: 29kB
Buffers: shared hit=56537 read=16693
-> Hash Join (cost=2319.22..141002.61 rows=16 width=20) (actual time=859.917..93629.204 rows=52 loops=1)
Hash Cond: (partner_stockrecord.partner_id = partner_partner.id)
Buffers: shared hit=56537 read=16693
-> Nested Loop (cost=2318.11..141001.34 rows=16 width=24) (actual time=859.853..93628.903 rows=52 loops=1)
Buffers: shared hit=56536 read=16693
-> Hash Join (cost=2317.69..140994.41 rows=16 width=24) (actual time=859.824..93627.791 rows=52 loops=1)
Hash Cond: (order_line.order_id = order_order.id)
Buffers: shared hit=56328 read=16693
-> Nested Loop (cost=108.94..138731.32 rows=20700 width=20) (actual time=1.566..93206.434 rows=74999 loops=1)
Buffers: shared hit=55334 read=16686
-> Bitmap Heap Scan on catalogue_product (cost=43.48..87.52 rows=12 width=8) (actual time=0.080..0.183 rows=12 loops=1)
Recheck Cond: (odoo_id = ANY ('{6241,6499,6500,49195,44753,44754,53427,6452,44755,44787,6427,6428}'::integer[]))
Heap Blocks: exact=11
Buffers: shared hit=35
-> Bitmap Index Scan on catalogue_product_odoo_id_c5e41bad (cost=0.00..43.48 rows=12 width=0) (actual time=0.072..0.072 rows=12 loops=1)
Index Cond: (odoo_id = ANY ('{6241,6499,6500,49195,44753,44754,53427,6452,44755,44787,6427,6428}'::integer[]))
Buffers: shared hit=24
-> Bitmap Heap Scan on order_line (cost=65.45..11521.37 rows=3228 width=20) (actual time=22.555..7764.120 rows=6250 loops=12)
Recheck Cond: (product_id = catalogue_product.id)
Heap Blocks: exact=71735
Buffers: shared hit=55299 read=16686
-> Bitmap Index Scan on order_line_product_id_e620902d (cost=0.00..64.65 rows=3228 width=0) (actual time=21.532..21.532 rows=6269 loops=12)
Index Cond: (product_id = catalogue_product.id)
Buffers: shared hit=143 read=107
-> Hash (cost=2194.42..2194.42 rows=1147 width=12) (actual time=365.766..365.766 rows=1313 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 73kB
Buffers: shared hit=994 read=7
-> Index Scan using order_date_placed_utc_date_idx on order_order (cost=0.43..2194.42 rows=1147 width=12) (actual time=0.050..365.158 rows=1313 loops=1)
Index Cond: ((timezone('UTC'::text, date_placed))::date = '2022-11-22'::date)
Filter: ((status)::text <> ALL ('{Pending,PaymentDeclined,Canceled}'::text[]))
Rows Removed by Filter: 253
Buffers: shared hit=994 read=7
-> Index Scan using partner_stockrecord_pkey on partner_stockrecord (cost=0.41..0.43 rows=1 width=8) (actual time=0.017..0.017 rows=1 loops=52)
Index Cond: (id = order_line.stockrecord_id)
Buffers: shared hit=208
-> Hash (cost=1.05..1.05 rows=5 width=8) (actual time=0.028..0.028 rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on partner_partner (cost=0.00..1.05 rows=5 width=8) (actual time=0.013..0.015 rows=5 loops=1)
Filter: (odoo_id IS NOT NULL)
Buffers: shared hit=1
Planning time: 3.275 ms
Execution time: 93629.781 ms
CodePudding user response:
It is not the recheck that is expensive, it is the many table blocks that have to be read (shared hit=55299 read=16686
). And all that for 6250 rows. Either PostgreSQL has to dig through a lot of dead data (does VACUUM (VERBOSE)
on the table speak of many dead rows that cannot be removed yet?), or you have some large column values in the table that need to be read from TOAST.
CodePudding user response:
It doesn't have to do any rechecks. That line in the plan comes from the planner, not from the run-time part. (you can tell because if you just do EXPLAIN without ANALYZE, the line still appears.) At planning time, it doesn't know whether any of the bitmap will overflow, so it has to be prepared to do the recheck, even if that turns out not to be necessary to execute it at run time. The slowness almost certainly comes from the time spent reading 16686 random pages, which could be made clear by turning on track_io_timing.