I'm using PostgreSQL 12.8 and I have two tables:
CREATE TABLE s (
id BIGINT PRIMARY KEY,
type VARCHAR,
active BOOLEAN
);
With more than 230k rows, and
CREATE TABLE s_aud (
id BIGINT NOT NULL,
type VARCHAR,
revision_id INT4 NOT NULL,
revision_type SMALLINT NOT NULL,
CONSTRAINT s_aud_pk PRIMARY KEY (id, revision_id)
);
that contains more than 4M rows and is an apprend-only table where we store every add, update or delete operation done in s table
, s_aud
table does not contain any FK to s table
. The problem is that I would like to execute the following query:
SELECT s.*, a.revision_id
FROM s
JOIN (
SELECT id, MAX(revision_id) AS revision_id
FROM s_aud
WHERE revision_type <> 2 AND type = 'X_TYPE'
GROUP BY id
) a ON s.id = a.id
WHERE s.type = 'X_TYPE' AND s.active = true;
Which is something like, get the latest revision_id of revision_type 2 of every id in the table.
If I execute the query, it takes more than 10 minutes to execute which is not acceptable, how can I improve it? I tried to add an index to:
CREATE INDEX s_aud_id_idx ON s_aud (id);
CREATE INDEX s_aud_revision_type_idx ON s_aud (revision_type);
But didn't affected to the performance of the query, any ideas?
EDIT, EXPLAIN WITH (ANALYZE, VERBOSE, BUFFERS, format text)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=138478.70..149103.60 rows=1 width=238) (actual time=432.466..327417.023 rows=744 loops=1)
Output: s.id, s.type, s.active, (max(s_aud.revision_id))
Inner Unique: true
Join Filter: (s.id = s_aud.id)
Rows Removed by Join Filter: 276396
Buffers: shared hit=77360594 read=7600
I/O Timings: read=22.744
-> Gather (cost=1000.00..10021.67 rows=1 width=234) (actual time=0.296..1.279 rows=744 loops=1)
Output: s.id, s.type, s.active
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=218 read=7600
I/O Timings: read=22.744
-> Parallel Seq Scan on db_schema.s (cost=0.00..9021.57 rows=1 width=234) (actual time=0.077..23.769 rows=248 loops=3)
Output: s.id, s.type, s.active
Filter: (s.active AND ((s.type)::text = 'X_TYPE'::text))
Rows Removed by Filter: 76643
Buffers: shared hit=218 read=7600
I/O Timings: read=22.744
Worker 0: actual time=0.132..36.318 rows=463 loops=1
Buffers: shared hit=217 read=3811
I/O Timings: read=11.326
Worker 1: actual time=0.016..34.903 rows=280 loops=1
Buffers: shared read=3785
I/O Timings: read=11.401
-> Finalize GroupAggregate (cost=137478.70..138951.16 rows=5812 width=12) (actual time=439.679..440.021 rows=372 loops=744)
Output: s_aud.id, max(s_aud.revision_id)
Group Key: s_aud.id
Buffers: shared hit=26138272
-> Gather Merge (cost=137478.70..138834.92 rows=11624 width=12) (actual time=439.672..439.859 rows=1111 loops=744)
Output: s_aud.id, (PARTIAL max(s_aud.revision_id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=26138272
-> Sort (cost=136478.67..136493.20 rows=5812 width=12) (actual time=435.151..435.183 rows=581 loops=2232)
Output: s_aud.id, (PARTIAL max(s_aud.revision_id))
Sort Key: s_aud.id
Sort Method: quicksort Memory: 59kB
Worker 0: Sort Method: quicksort Memory: 59kB
Worker 1: Sort Method: quicksort Memory: 59kB
Buffers: shared hit=77360376
Worker 0: actual time=433.575..433.613 rows=689 loops=744
Buffers: shared hit=25663619
Worker 1: actual time=434.099..434.136 rows=682 loops=744
Buffers: shared hit=25627461
-> Partial HashAggregate (cost=136057.16..136115.28 rows=5812 width=12) (actual time=434.849..434.962 rows=741 loops=2232)
Output: s_aud.id, PARTIAL max(s_aud.revision_id)
Group Key: s_aud.id
Buffers: shared hit=77348472
Worker 0: actual time=433.259..433.372 rows=740 loops=744
Buffers: shared hit=25657667
Worker 1: actual time=433.781..433.894 rows=740 loops=744
Buffers: shared hit=25621509
-> Parallel Seq Scan on db_schema.s_aud (cost=0.00..129536.26 rows=1304180 width=12) (actual time=0.017..285.222 rows=1039458 loops=2232)
Output: s_aud.id, s_aud.revision_id
Filter: ((s_aud.revision_type <> 2) AND ((s_aud.type)::text = 'X_TYPE'::text))
Rows Removed by Filter: 324437
Buffers: shared hit=77348472
Worker 0: actual time=0.007..283.757 rows=1034585 loops=744
Buffers: shared hit=25657667
Worker 1: actual time=0.007..284.260 rows=1033185 loops=744
Buffers: shared hit=25621509
Planning Time: 0.187 ms
CodePudding user response:
Your indexes are not appropriate. In table s you look for s.type = 'X_TYPE' AND s.active = true
, but there is no index on the columns it seems. In table s_aud you want revision_type <> 2 AND type = 'X_TYPE'
, and there is only an index on revision_type.
Have composite indexes or even partial indexes instead. Use the latter if you aways look at the same values, e.g. always at type = 'X_TYPE'
;
Composite indexes:
create index idx1 on s (type, active, id);
create index idx1 on s_aud (type, revision_type, id, revision_id);
Partial indexes:
create index idx3 on s (id) where type = 'X_TYPE' AND active = true;
create index idx4 on s_aud (id, revision_id) where revision_type <> 2 AND type = 'X_TYPE';
Aa a last option you could even partition your tables by type or active status.