Optimize subquery with max postgresql-CodePudding

I'm using PostgreSQL 12.8 and I have two tables:

CREATE TABLE s (
    id     BIGINT PRIMARY KEY,
    type   VARCHAR,
    active BOOLEAN
);

With more than 230k rows, and

CREATE TABLE s_aud (
    id            BIGINT NOT NULL,
    type          VARCHAR,
    revision_id   INT4 NOT NULL,
    revision_type SMALLINT NOT NULL,
    CONSTRAINT s_aud_pk PRIMARY KEY (id, revision_id)
);

that contains more than 4M rows and is an apprend-only table where we store every add, update or delete operation done in s table, s_aud table does not contain any FK to s table. The problem is that I would like to execute the following query:

SELECT s.*, a.revision_id 
FROM s
JOIN (
  SELECT id, MAX(revision_id) AS revision_id 
  FROM s_aud 
  WHERE revision_type <> 2 AND type = 'X_TYPE'
  GROUP BY id
) a ON s.id = a.id 
WHERE s.type = 'X_TYPE' AND s.active = true;

Which is something like, get the latest revision_id of revision_type 2 of every id in the table.

If I execute the query, it takes more than 10 minutes to execute which is not acceptable, how can I improve it? I tried to add an index to:

CREATE INDEX s_aud_id_idx ON s_aud (id);
CREATE INDEX s_aud_revision_type_idx ON s_aud (revision_type);

But didn't affected to the performance of the query, any ideas?

EDIT, EXPLAIN WITH (ANALYZE, VERBOSE, BUFFERS, format text)

QUERY PLAN                                                                                                                                                                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=138478.70..149103.60 rows=1 width=238) (actual time=432.466..327417.023 rows=744 loops=1)                                                                                                                             
  Output: s.id, s.type, s.active, (max(s_aud.revision_id))
  Inner Unique: true                                                                                                                                                                                                                     
  Join Filter: (s.id = s_aud.id)                                                                                                                                                                                                   
  Rows Removed by Join Filter: 276396                                                                                                                                                                                                    
  Buffers: shared hit=77360594 read=7600                                                                                                                                                                                                 
  I/O Timings: read=22.744                                                                                                                                                                                                               
  ->  Gather  (cost=1000.00..10021.67 rows=1 width=234) (actual time=0.296..1.279 rows=744 loops=1)                                                                                                                                      
        Output: s.id, s.type, s.active           
        Workers Planned: 2                                                                                                                                                                                                               
        Workers Launched: 2                                                                                                                                                                                                              
        Buffers: shared hit=218 read=7600                                                                                                                                                                                                
        I/O Timings: read=22.744                                                                                                                                                                                                         
        ->  Parallel Seq Scan on db_schema.s (cost=0.00..9021.57 rows=1 width=234) (actual time=0.077..23.769 rows=248 loops=3)                                                                                      
              Output: s.id, s.type, s.active     
              Filter: (s.active AND ((s.type)::text = 'X_TYPE'::text))                                                                                                                                                            
              Rows Removed by Filter: 76643                                                                                                                                                                                              
              Buffers: shared hit=218 read=7600                                                                                                                                                                                          
              I/O Timings: read=22.744                                                                                                                                                                                                   
              Worker 0: actual time=0.132..36.318 rows=463 loops=1                                                                                                                                                                       
                Buffers: shared hit=217 read=3811                                                                                                                                                                                        
                I/O Timings: read=11.326                                                                                                                                                                                                 
              Worker 1: actual time=0.016..34.903 rows=280 loops=1                                                                                                                                                                       
                Buffers: shared read=3785                                                                                                                                                                                                
                I/O Timings: read=11.401                                                                                                                                                                                                 
  ->  Finalize GroupAggregate  (cost=137478.70..138951.16 rows=5812 width=12) (actual time=439.679..440.021 rows=372 loops=744)                                                                                                          
        Output: s_aud.id, max(s_aud.revision_id)                                                                                                                                                                             
        Group Key: s_aud.id                                                                                                                                                                                                        
        Buffers: shared hit=26138272                                                                                                                                                                                                     
        ->  Gather Merge  (cost=137478.70..138834.92 rows=11624 width=12) (actual time=439.672..439.859 rows=1111 loops=744)                                                                                                             
              Output: s_aud.id, (PARTIAL max(s_aud.revision_id))                                                                                                                                                             
              Workers Planned: 2                                                                                                                                                                                                         
              Workers Launched: 2                                                                                                                                                                                                        
              Buffers: shared hit=26138272                                                                                                                                                                                               
              ->  Sort  (cost=136478.67..136493.20 rows=5812 width=12) (actual time=435.151..435.183 rows=581 loops=2232)                                                                                                                
                    Output: s_aud.id, (PARTIAL max(s_aud.revision_id))                                                                                                                                                       
                    Sort Key: s_aud.id                                                                                                                                                                                             
                    Sort Method: quicksort  Memory: 59kB                                                                                                                                                                                 
                    Worker 0:  Sort Method: quicksort  Memory: 59kB                                                                                                                                                                      
                    Worker 1:  Sort Method: quicksort  Memory: 59kB                                                                                                                                                                      
                    Buffers: shared hit=77360376                                                                                                                                                                                         
                    Worker 0: actual time=433.575..433.613 rows=689 loops=744                                                                                                                                                            
                      Buffers: shared hit=25663619                                                                                                                                                                                       
                    Worker 1: actual time=434.099..434.136 rows=682 loops=744                                                                                                                                                            
                      Buffers: shared hit=25627461                                                                                                                                                                                       
                    ->  Partial HashAggregate  (cost=136057.16..136115.28 rows=5812 width=12) (actual time=434.849..434.962 rows=741 loops=2232)                                                                                         
                          Output: s_aud.id, PARTIAL max(s_aud.revision_id)                                                                                                                                                   
                          Group Key: s_aud.id                                                                                                                                                                                      
                          Buffers: shared hit=77348472                                                                                                                                                                                   
                          Worker 0: actual time=433.259..433.372 rows=740 loops=744                                                                                                                                                      
                            Buffers: shared hit=25657667                                                                                                                                                                                 
                          Worker 1: actual time=433.781..433.894 rows=740 loops=744                                                                                                                                                      
                            Buffers: shared hit=25621509                                                                                                                                                                                 
                          ->  Parallel Seq Scan on db_schema.s_aud (cost=0.00..129536.26 rows=1304180 width=12) (actual time=0.017..285.222 rows=1039458 loops=2232)                                                   
                                Output: s_aud.id, s_aud.revision_id                                                                                                                                                          
                                Filter: ((s_aud.revision_type <> 2) AND ((s_aud.type)::text = 'X_TYPE'::text))                                                                                                        
                                Rows Removed by Filter: 324437                                                                                                                                                                           
                                Buffers: shared hit=77348472                                                                                                                                                                             
                                Worker 0: actual time=0.007..283.757 rows=1034585 loops=744                                                                                                                                              
                                  Buffers: shared hit=25657667                                                                                                                                                                           
                                Worker 1: actual time=0.007..284.260 rows=1033185 loops=744                                                                                                                                              
                                  Buffers: shared hit=25621509                                                                                                                                                                           
Planning Time: 0.187 ms

CodePudding user response：

Your indexes are not appropriate. In table s you look for s.type = 'X_TYPE' AND s.active = true, but there is no index on the columns it seems. In table s_aud you want revision_type <> 2 AND type = 'X_TYPE', and there is only an index on revision_type.

Have composite indexes or even partial indexes instead. Use the latter if you aways look at the same values, e.g. always at type = 'X_TYPE';

Composite indexes:

create index idx1 on s (type, active, id);
create index idx1 on s_aud (type, revision_type, id, revision_id);

Partial indexes:

create index idx3 on s (id) where type = 'X_TYPE' AND active = true;
create index idx4 on s_aud (id, revision_id) where revision_type <> 2 AND type = 'X_TYPE';

Aa a last option you could even partition your tables by type or active status.