We've run into a strange issue with Postgres, initially unearthed as we prepare to upgrade from 10.21 to 11 . What we'd initially thought was a performance regression in newer versions of Postgres appears, in fact, to be a performance regression once more accurate stats are generated for our database. As it turns out, both last_analyze
and last_autoanalyze
are NULL
for the table in question. The upgrade corrects this, but makes a certain subset of queries much slower.
Background
We're currently running PG 10.21 in production. The table we're interacting with here, events_test
includes two columns that we're testing on, account
(varchar(40)
) and timestamp
(timestamptz
). There exist indexes on account
, timestamp
and a composite index on account
and timestamp
.
Our test query is
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events_test
WHERE account = '1234' AND TIMESTAMP >= (NOW() AT TIME ZONE 'EST' - interval '1 month');
We're testing it against a couple different account
values — some we expect to return a large quantity of data (tens of thousands of rows for account 1234
) and some a low quantity (tens of rows for account 5678
).
On our current production DB, this runs quickly enough for our purposes, around 80-100ms for the large response (account 1234
) and ~1ms for the small response (account 5678
). In both cases, the query planner opts to start with an index scan on the composite index events_test_account_timestamp_idx
.
Out of date stats
For the account returning a large number of rows, the query planner vastly underestimates the number of rows to be returned — it estimates ~500 rows, but gets around ~60,000. This is a result of the table stats being out of date on our production DB.
Manually running ANALYZE events_test;
, or upgrading to PG11, PG12 or PG14 (which seems to cause this reanalysis to be run shortly after the upgrade) updates the table stats and causes the row estimate to be closer to the true value — the new estimate is on the order of ~20,000 rows for the account with more data returned.
Query planner
Unfortunately, this more-accurate estimate seems to result in the query planner running a much less efficient query. For the account with little data, the row estimate doesn't change and the planner continues to use the composite index. For the account with a larger amount of returned data, however, the planner now opts for an index scan on the events_test_timestamp_idx
, followed by a filter on the account
field, which takes the execution time of the query up to 400-500ms.
The query that expects a small number of rows to be returned continues to use the composite index on all versions of Postgres.
What we've tried
We've dug around on SO looking for similar questions about Postgres bypassing a composite index, but nothing we've found is of use. Also:
Analyze, reindex, stats target
We've tried updating the default_statistics_target
from 100 to 1000 (and have also tried it briefly at 10 and 10000), followed by reanalyzing our table and reindexing, and the slower query plan has not changed.
Postgres versions
This issue was inintially revealed as we validated against newer versions of Postgres. We've tested after upgrading to PG11, PG12 and PG14 and the slower performance query planning shows up in all cases. We can also cause it to happen on PG10 by reanalyzing to update our stats, so it doesn't appear that this is a change in planner behaviour in a Postgres update.
Force composite index
We've verified that we can indeed force Postgres to use the composite index by (temporarily) dropping the timestamp
index the planner is now preferring with this test command:
BEGIN;
DROP INDEX events_test_timestamp_idx;
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events_test
WHERE account = '1234' AND TIMESTAMP >= (NOW() AT TIME ZONE 'EST' - interval '1 month');
ROLLBACK;
This causes Postgres to revert to the composite index events_test_account_timestamp_idx
, however it now chooses to do a bitmap index scan on this index. This continues to run in under 100ms, and is comparable in performance to the previous straight index scan on the composite index.
Query EXPLAIN
s
Here is what the query looks like when it's running more efficiently, in production, on PG10:
Index Scan using events_test_account_timestamp_idx on events_test (cost=0.11..1109.11 rows=579 width=206) (actual time=0.062..116.439 rows=71700 loops=1)
Index Cond: (((account)::text = '1234'::text) AND ("timestamp" >= (timezone('EST'::text, now()) - '1 mon'::interval)))
Buffers: shared hit=68619 read=1249
I/O Timings: read=23.426
Planning time: 0.662 ms
Execution time: 119.339 ms
And here is what it looks like once it has updated statistics, on PG14 (but the output is similar on all versions):
Index Scan using events_test_timestamp_idx on events_test (cost=0.11..31867.79 rows=18994 width=204) (actual time=0.164..311.960 rows=55583 loops=1)
Index Cond: ("timestamp" >= ((now() AT TIME ZONE 'EST'::text) - '1 mon'::interval))
Filter: ((account)::text = '1234'::text)
Rows Removed by Filter: 462327
Buffers: shared hit=492867
Planning:
Buffers: shared hit=144 read=30
I/O Timings: read=6.160
Planning Time: 7.021 ms
Execution Time: 314.676 ms
And finally, if we force it to use the composite index by temporarily dropping the timestamp index, on PG14 (but again, similar on all versions):
Bitmap Heap Scan on events_test (cost=187.05..35867.12 rows=18992 width=204) (actual time=11.373..38.937 rows=55582 loops=1)
Recheck Cond: (((account)::text = '1234'::text) AND ("timestamp" >= ((now() AT TIME ZONE 'EST'::text) - '1 mon'::interval)))
Heap Blocks: exact=13586
Buffers: shared hit=13803
-> Bitmap Index Scan on events_test_account_timestamp_idx (cost=0.00..186.10 rows=18992 width=0) (actual time=9.376..9.376 rows=55582 loops=1)
Index Cond: (((account)::text = '1234'::text) AND ("timestamp" >= ((now() AT TIME ZONE 'EST'::text) - '1 mon'::interval)))
Buffers: shared hit=217
Planning:
Buffers: shared hit=5
Planning Time: 0.316 ms
Execution Time: 41.446 ms
Edit: It seems like the core issue here is that, with an accurate estimate, Postgres expects an index and filter on the single index to be faster (31867.79) than using the composite index (35867.12), when in fact the composite is nearly 10x faster in the above case. Which is interesting, given that the last two EXPLAIN
s above are using the same set of statistics to come up with these predictions.
CodePudding user response:
Sometimes, when selecting small data, or when the data brought during the selection is much more than the remaining data, then PostgreSQL uses a filter instead of using an index, and on the contrary, deletes the remaining data. If you want to interfere with this and always use an index, then try checking execution plan and time with temporarily set enable_seqscan = off
and if possible, read this as well, you may need it: Examining Index Usage
CodePudding user response:
I can't put my finger on the problem, but here a few considerations:
Did you create the composite index on
(account, "timestamp")
? The order matters, and this order would be better.It seems that PostgreSQL expects that the condition on
account
is not very selective, but it is wrong. Perhaps extended statistics can improve its estimate:CREATE STATISTICS events_test_account_timestamp (dependencies, mvc) ON account, "timestamp" FROM events_test; ANALYZE events_test;
did you set
random_page_cost
the same on both systems?