Why does postgresql planner choose bad plan, for a few specific values only-CodePudding

I have a biggish table in postgresql 15.1 — maybe 50 million rows and growing. A column mmsi has about 30k distinct values, so 1000 rows per mmsi.

My problem is that I have a query that I need to execute repeatedly during DB load, and for certain values of mmsi it takes hundreds of seconds instead of milliseconds. The model query is simply

SELECT max(to_timestamp) FROM track WHERE mmsi = :mmsi

The anomaly is visible in the EXPLAIN output. The bad case (which only happens for a small fraction of mmsi values):

trackdb=# EXPLAIN SELECT max(to_timestamp) FROM track WHERE mmsi = 354710000;
QUERY PLAN
----------
Result  (cost=413.16..413.17 rows=1 width=8)
  InitPlan 1 (returns $0)
    - >  Limit  (cost=0.56..413.16 rows=1 width=8)
        - >  Index Scan Backward using ix_track_to_timestamp on track  (cost=0.56..3894939.14 rows=9440 width=8)
             Index Cond: (to_timestamp IS NOT NULL)
             Filter: (mmsi = 354710000)
(6 rows)

Good case (the vast majority):

trackdb=# EXPLAIN SELECT max(to_timestamp) FROM track WHERE mmsi = 354710001;
QUERY PLAN
----------
Aggregate  (cost=1637.99..1638.00 rows=1 width=8)
  - >  Index Scan using ix_track_mmsi on track  (cost=0.44..1635.28 rows=1082 width=8)
       Index Cond: (mmsi = 354710001)
(3 rows)

Now, I notice that the estimated number of rows is larger in the bad case. I can not see anything in the postgresql statistics (pg_stats.histogram_bounds) to explain this.

The problem seems to change when I ANALYZE the table, in that the specific values to trigger the problem becomes different. But anyhow, since this is needed during DB load, ANALYZE is not a solution.

I'm stumped. Does anyone have an idea what could be happening?

[Edit: To clarify, I know ways to work around it, for example by materializing the rows before applying max. But not understanding makes me very unhappy.]

CodePudding user response：

As Laurenz has explained, the problem is that PostgreSQL thinks the approx 10,000 rows where mmsi = 354710000 are scattered randomly over the values of to_timestamp, and so thinks that by scanning the index over to_timestamp in order, it can stop as soon as it finds the first one meeting mmsi = 354710000 and that will happen quickly. But of all the mmsi = 354710000 are on the wrong end of the index, it does not in fact happen quickly. There is nothing you can do about this in the stats, as there are no "handles" it can grab into to better inform its thinking. Maybe some future extension to the custom stats feature will do it.

Edit: To clarify, I know ways to work around it, for example by materializing the rows before applying max.

A better ~~work around~~ solution would probably be an index on (mmsi,to_timestamp). This would not only fix the case where it currently chooses a very bad plan, it would substantially improve the cases currently using a tolerable plan by giving them an even better option. And you don't need to rewrite the query. And you can drop the existing index just on mmsi, as there is no reason to have both.