I have two indexes showing very different results for similar searches, and I don't know why this is:
# The two indexes
"index_buyer_suppliers_on_buyer_id" btree (buyer_id)
"index_buyer_suppliers_on_supplier_id_and_buyer_id" UNIQUE, btree (supplier_id, buyer_id)
# The two searches
@current_suppliers = BuyerSupplier.where("buyer_id = ?", @entity.id).length
@current_buyers = BuyerSupplier.where("supplier_id = ?", @entity.id).length
# The EXPLAINs
1. EXPLAIN SELECT "buyer_suppliers".* FROM "buyer_suppliers" WHERE (buyer_id = 891285);
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Scan using index_buyer_suppliers_on_buyer_id on buyer_suppliers (cost=0.43..54.48 rows=15 width=56)
Index Cond: (buyer_id = 891285)
(2 rows)
2. EXPLAIN SELECT "buyer_suppliers".* FROM "buyer_suppliers" WHERE (supplier_id = 891285);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on buyer_suppliers (cost=2309.31..32194.90 rows=99727 width=56)
Recheck Cond: (supplier_id = 891285)
-> Bitmap Index Scan on index_buyer_suppliers_on_supplier_id_and_buyer_id (cost=0.00..2284.38 rows=99727 width=0)
Index Cond: (supplier_id = 891285)
(4 rows)
# The EXPLAINs with (ANALYZE, BUFFERS)
1. EXPLAIN (ANALYZE, BUFFERS) SELECT "buyer_suppliers".* FROM "buyer_suppliers" WHERE (buyer_id = 891285);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using index_buyer_suppliers_on_buyer_id on buyer_suppliers (cost=0.43..54.48 rows=15 width=56) (actual time=0.042..0.042 rows=0 loops=1)
Index Cond: (buyer_id = 891285)
Buffers: shared hit=3
Planning Time: 0.103 ms
Execution Time: 0.057 ms
(5 rows)
Time: 0.399 ms
2. EXPLAIN (ANALYZE, BUFFERS) SELECT "buyer_suppliers".* FROM "buyer_suppliers" WHERE (supplier_id = 891285);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on buyer_suppliers (cost=2309.31..32194.90 rows=99727 width=56) (actual time=8.584..32.907 rows=99205 loops=1)
Recheck Cond: (supplier_id = 891285)
Heap Blocks: exact=9016
Buffers: shared hit=9399
-> Bitmap Index Scan on index_buyer_suppliers_on_supplier_id_and_buyer_id (cost=0.00..2284.38 rows=99727 width=0) (actual time=7.316..7.317 rows=99205 loops=1)
Index Cond: (supplier_id = 891285)
Buffers: shared hit=383
Planning Time: 0.107 ms
Execution Time: 39.306 ms
(9 rows)
Time: 39.806 ms
I always thought that just using the first part of a combined index was completely fine, but here we see a massive performance degradation from doing so. Why is this happening? Does this mean the recommended indexes to have would be these four: 1) :buyer_id
, 2) :supplier_id
, 3) [:buyer_id, :supplier_id]
, 4) [:supplier_id, :buyer_id]
? This seems like overkill but based on the above is it perhaps necessary?
CodePudding user response:
It is hardly surprising that an index scan that returns 99205 takes somewhat more time than an index scan that returns no row at all. 40 milliseconds doesn't seem slow for that. So there is no evidence here that using a multi-column index is much slower than a single column index.
To verify that, you can create an index on only supplier_id
and see if it is much slower. Any speed gains would have to be in the first 7 milliseconds when the actual index is scanned. The smaller index will be slightly faster, and PostgreSQL will prefer it, but don't expect an average speed gain of more than a few milliseconds.