ActiveRecord `in_batches` times out after scrolling a few pages-CodePudding

I have a purge job that runs every day cleaning up some old records. I'm using PostgreSQL 10.21.The table in which it is performing the action have these attributes (the ones that are relevant for the question):

create table example_table
(
    id                          bigserial primary key,
    created_at                  timestamp not null
);

create index index_example_table_on_created_at on example_table (created_at desc);

The job that runs every day is:

ExampleTable.expired.in_batches(of: 1000, &:delete_all)

The table has lots of inserts (more than reads) and currently it has 6.5M of rows. The timeout always happens when fetching the ids of one of the batches. Here is a sample of the ActiveRecord logs:

ExampleTable Pluck (2.7ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') ORDER BY "example_table"."id" ASC LIMIT $1  [["LIMIT", 1000]]
ExampleTable Pluck (2.0ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13534069], ["LIMIT", 1000]]
ExampleTable Pluck (2.0ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13535069], ["LIMIT", 1000]]
ExampleTable Pluck (1.8ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13536069], ["LIMIT", 1000]]
ExampleTable Pluck (1.8ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13537069], ["LIMIT", 1000]]
ExampleTable Pluck (2.2ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13538069], ["LIMIT", 1000]]
ExampleTable Pluck (2.1ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13539069], ["LIMIT", 1000]]
ExampleTable Pluck (1.9ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13540069], ["LIMIT", 1000]]
ExampleTable Pluck (5001.9ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13541069], ["LIMIT", 1000]]

After running an explain on the last two I've got these plans:

For the query that takes only a few ms:

Limit  (cost=0.09..4343.25 rows=1000 width=8) (actual time=0.022..0.496 rows=1000 loops=1)
  Output: id
  Buffers: shared hit=318
  ->  Index Scan using example_table_pkey on public.example_table  (cost=0.09..363301.29 rows=83649 width=8) (actual time=0.021..0.430 rows=1000 loops=1)
        Output: id
        Index Cond: (example_table.id > 13540069)
        Filter: (example_table.created_at < '2022-04-23 19:49:08.020992'::timestamp without time zone)
        Buffers: shared hit=318
Planning time: 0.073 ms
Execution time: 0.545 ms

For the one that times out:

Limit  (cost=0.09..4343.93 rows=1000 width=8) (actual time=0.015..2702.674 rows=941 loops=1)
  Output: id
  Buffers: shared hit=2794464
  ->  Index Scan using example_table_pkey on public.example_table  (cost=0.09..363297.11 rows=83635 width=8) (actual time=0.015..2702.614 rows=941 loops=1)
        Output: id
        Index Cond: (example_table.id > 13541069)
        Filter: (example_table.created_at < '2022-04-23 19:49:08.020992'::timestamp without time zone)
        Rows Removed by Filter: 6577891
        Buffers: shared hit=2794464
Planning time: 0.086 ms
Execution time: 2702.721 ms

The only difference between those two plans is that the one that takes more is removing a lot of rows, but I don't follow why they differ... Also the last one has a massive share hit, compared to the other one. I've tested removing the created_at condition and it just works, so it must be something with that where clause.

What could be causing this spike on one page? Postgres doens't like scrolling records under a condition that is not the PK?

CodePudding user response：

Part of it is easy to explain. Eventually it runs out of expired records, and when it does it reads the entire table (through the index) in a futile effort to find more. The last batch only found 941 because that is all there was, but it read through 6577891 looking for another 59 without finding them.

Obviously it would be better for this to use the index on created_at for the last batch, because it would know when the scan becomes futile and stop. But it doesn't do that because it doesn't know which batch is the last batch, as it thinks there were 83,635 rows left to find rather than 941. Stats are only recomputed occasionally, and that isn't frequent enough to save you in this case. If you could ANALYZE the table between every batch, that would probably fix the problem, but analyzing so often would itself be a high cost.

The reason it likes using the index on "id" is that that way it could comply with the ORDER BY without having to read and sort an expected 83,635 rows. But why is the ORDER BY even there? It is not obviously implied by the code you show--maybe activerecord adds it just to reduce the risk of deadlock?

Why is this even subject to timeout? If it took a few seconds to run but succeeded rather than timing out, you would probably never even notice. I don't think there is reason to apply a strict timeout to background house-keeping jobs. Maybe you could arrange to exempt this from the timeout. And why do it in such small batches? If you didn't have a LIMIT at all, you probably wouldn't have this problem.