Postgresql - Index scan - Slow filtering-CodePudding

I try to improve query performances on a big (500M rows) time partitioned table. Here is the simplified table structure:

CREATE TABLE execution (
    start_time              TIMESTAMP WITH TIME ZONE NOT NULL,
    end_time                TIMESTAMP WITH TIME ZONE,
    restriction_criteria    VARCHAR(36)  NOT NULL
    PARTITION BY RANGE (start_time);

Time partitioning

is based on the start_time column because the end_time value is not known when the row is created.
is used to implement efficiently the retention policy.

Request looks like to this generic pattern

SELECT *
FROM execution
WHERE start_time BETWEEN :from AND start_time :to
  AND restriction_criteria IN ('123', '456')
ORDER BY end_time DESC, id
    FETCH NEXT 20 ROWS ONLY;

I've got the "best" performances using this index

CREATE INDEX IF NOT EXISTS end_time_desc_start_time_index ON execution USING btree (end_time DESC, start_time);

Yet, performances are not good enough.

Limit  (cost=1303.21..27189.31 rows=20 width=1674) (actual time=6791.191..6791.198 rows=20 loops=1)
  ->  Incremental Sort  (cost=1303.21..250693964.74 rows=193689 width=1674) (actual time=6791.189..6791.194 rows=20 loops=1)
"        Sort Key: execution.end_time DESC, execution.id"
        Presorted Key: execution.end_time
        Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 64kB  Peak Memory: 64kB
        ->  Merge Append  (cost=8.93..250685248.74 rows=193689 width=1674) (actual time=4082.161..6791.047 rows=21 loops=1)
              Sort Key: execution.end_time DESC
              Subplans Removed: 15
              ->  Index Scan using execution_2021_10_end_time_start_time_idx on execution_2021_10 execution_1  (cost=0.56..113448316.66 rows=93103 width=1674) (actual time=578.896..578.896 rows=1 loops=1)
                    Index Cond: ((start_time <= '2021-12-05 02:00:04 00'::timestamp with time zone) AND (start_time >= '2021-10-02 02:00:04 00'::timestamp with time zone))
"                    Filter: (((restriction_criteria)::text = ANY ('{123,456}'::text[])))"
                    Rows Removed by Filter: 734
              ->  Index Scan using execution_2021_11_end_time_start_time_idx on execution_2021_11 execution_2  (cost=0.56..113653576.54 rows=87605 width=1674) (actual time=116.841..116.841 rows=1 loops=1)
                    Index Cond: ((start_time <= '2021-12-05 02:00:04 00'::timestamp with time zone) AND (start_time >= '2021-10-02 02:00:04 00'::timestamp with time zone))
"                    Filter: (((restriction_criteria)::text = ANY ('{123,456}'::text[])))"
                    Rows Removed by Filter: 200
              ->  Index Scan using execution_2021_12_end_time_start_time_idx on execution_2021_12 execution_3  (cost=0.56..16367185.18 rows=12966 width=1674) (actual time=3386.416..6095.261 rows=21 loops=1)
                    Index Cond: ((start_time <= '2021-12-05 02:00:04 00'::timestamp with time zone) AND (start_time >= '2021-10-02 02:00:04 00'::timestamp with time zone))
"                    Filter: (((restriction_criteria)::text = ANY ('{123,456}'::text[])))"
                    Rows Removed by Filter: 5934
Planning Time: 4.108 ms
Execution Time: 6791.317 ms

The index Filter looks is very slow.

I set up a multi-column index hoping the filtering would be done in the Index cond. But it doesn't work

CREATE INDEX IF NOT EXISTS pagination_index ON execution USING btree (end_time DESC, start_time, restriction_criteria);

My feeling is that the first index column should be end_time because we want to leverage the btree index sorting capability. The second one should be restriction_criteria so that an index cond filters rows which doesn't match the restriction_criteria. However, this doesn't work because the query planner need to also check the start_time clause.

The alternative I imagine is to get rid of the partitioning because a multi-column end_time, restriction_critera index would work just fine.

Yet, this is not a perfect solution because dealing with our retention policy would become a pain.

Is there another alternative allowing to keep the start_time partitioning ?

CodePudding user response：

While you can have the output sorted if the leading column is end_time you can reduce the amount of data processed if you use start_time as a leading column.

Since your filter in start_time and restriction_criteria, is excluding ~7000 rows in order to retrieve 20, maybe speeding up the filtering is more important that speeding up the sorting.

CREATE INDEX IF NOT EXISTS execution_start_time_restriction_idx 
 ON execution USING btree (start_time, restriction_criteria);

CREATE INDEX IF NOT EXISTS execution_restriction_start_time_idx 
  ON execution USING btree (restriction_criteria, start_time);

ANALYZE execution

FROM execution
WHERE start_time BETWEEN :from AND start_time :to
  AND restriction_criteria IN ('123', '456')

Is more than the number of rows removed by the filter then having the `end_time as the leading column might be a good idea. But the planner should be able to figure that out for you.

In the end if some of those indices are not used you can drop it.

CodePudding user response：

I set up a multi-column index hoping the filtering would be done in the Index cond

The index machinery is very circumspect about what code it runs inside the index. It won't call any operators that it doesn't 'trust', because if the operator throws an error then the whole query will error out, possibly due to rows that weren't even user 'visible' in the first place (i.e. ones that were already deleted or created but never committed). No one wants that. Now the =ANY construct could be considered trustable, but it is not. That means it won't be applied in the Index Cond, but must be applied against the table row, which in turn means you need to visit the table, which is probably where all your time is going, visiting random table rows.

I don't know what it would take code-wise to make =ANY trusted. I've made efforts to investigate that in the past but really never got anywhere, the code around the ANY is too complicated for me to grasp. That would be a nice improvement for the future, but won't help you now anyway.

One way around this is to get an index-only scan. At that point it will call arbitrary code in the index, as it already knows the tuple is visible. But it won't do that for you, because you are selecting at least one column not in the index (and also not shown in your CREATE command, but obviously present anyway)

If you create an index like your widest one but adding "id" to the end, and only select from among those columns, then you should be get a much faster index only scans with merge appends.

If you have even more columns than the ones you've shown plus "id", and you really need to select those columns, and don't want to add all of them to the index, then you can use a trick to use an index-only scan anyway by doing a dummy self join:

with t as (SELECT id
FROM execution
WHERE start_time BETWEEN :from AND :to
  AND restriction_criteria IN ('123', '456')
ORDER BY end_time DESC, id
    FETCH NEXT 20 ROWS ONLY
)
select real.* from execution real join t using (id)
ORDER BY end_time DESC, id

(If "id" is not unique, then you might need to join on additional column. Also, you would need an index on "id", which you probably already have)

This one will still need to visit the table to fetch the extra columns, but only for the 20 rows being returned, not for all the ones failing the restriction_criteria.

If the restriction_criteria is very selective, another approach might be better: an index on or leading with that column. It will need to read and sort all of those rows (in the relevant partitions) before applying the LIMIT, but if it is very selective this will not take long.