Home > Software design >  Is it possible to accelerate this postgres / sql (read) query?
Is it possible to accelerate this postgres / sql (read) query?

Time:07-05

I have the following table:

CREATE TABLE mosmix_data (
   id SERIAL PRIMARY KEY, 
   created_at TIMESTAMP without time zone default (now() at time zone 'utc'),
   TimeStep timestamp, 
   name char(8),
   description text, 
   PPPP float8, 
   E_PPP float8, TX float8, TTT float8, E_TTT float8, Td float8,
   E_Td float8, TN float8, TG float8, TM float8, T5cm float8,
   DD float8, E_DD float8, FF float8, E_FF float8, FX1 float8,
   FX3 float8, FX625 float8, FX640 float8, FX655 float8, FXh float8,
   FXh25 float8, FXh40 float8, FXh55 float8, N float8, Neff float8,
   Nlm float8, Nh float8, Nm float8, Nl float8, N05 float8, VV float8,
   VV10 float8, wwM float8, wwM6 float8, wwMh float8, wwMd float8,
   ww float8, ww3 float8, W1W2 float8, wwP float8, wwP6 float8,
   wwPh float8, wwPd float8, wwZ float8, wwZ6 float8, wwZh float8,
   wwD float8, wwD6 float8, wwDh float8, wwC float8, wwC6 float8,
   wwCh float8, wwT float8, wwT6 float8, wwTh float8, wwTd float8,
   wwS float8, wwS6 float8, wwSh float8, wwL float8, wwL6 float8,
   wwLh float8, wwF float8, wwF6 float8, wwFh float8, DRR1 float8,
   RR6c float8, RRhc float8, RRdc float8, RR1c float8, RRS1c float8,
   RRL1c float8, RR3c float8, RRS3c float8, R101 float8, R102 float8,
   R103 float8, R105 float8, R107 float8, R110 float8, R120 float8,
   R130 float8, R150 float8, RR1o1 float8, RR1w1 float8, RR1u1 float8,
   R600 float8, R602 float8, R610 float8, R650 float8, Rh00 float8,
   Rh02 float8, Rh10 float8, Rh50 float8, Rd00 float8, Rd02 float8,
   Rd10 float8, Rd50 float8, SunD float8, RSunD float8, PSd00 float8,
   PSd30 float8, PSd60 float8, RRad1 float8, Rad1h float8, SunD1 float8,
   SunD3 float8, PEvap float8, WPc11 float8, WPc31 float8, WPc61 float8,
   WPch1 float8, WPcd1 float8,
Point Geometry(Point, 4326)
);

Also I created the following indices:

CREATE INDEX ON mosmix_data using gist (Point);
CREATE INDEX name_idx ON mosmix_data(name);

First I select a weather station near lat/long position

select name 
from mosmix_data 
where timestep > now() 
order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' 
limit 1;

Result looks like this:

 name   
----------
  G385   
(1 row)

Then I get the data of this station based on the name from previous query:

select timestep, name, Rh02, TX, TN 
from mosmix_data 
where name like '%G385%' 
  and timestep between now() and now()   interval '4 days'  
order by timestep ASC;

And get the data.

Are anything I could do to make the query faster?

Here are the explainers:

 >explain select name from mosmix_data where timestep > now() order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' limit 1;
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.41..1.99 rows=1 width=17)
   ->  Index Scan using mosmix_data_point_idx on mosmix_data  (cost=0.41..2294305.65 rows=1453301 width=17)
         Order By: (point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
         Filter: (timestep > now())
(4 rows)

And

 >explain select timestep, name, Rh02, TX, TN from mosmix_data where name like '%G385%' and timestep between now() and now()   interval '4 days'  order by timestep ASC;
                                                           QUERY PLAN                                                        
    -------------------------------------------------------------------------------------------------------------------------
     Gather Merge  (cost=391389.46..391395.31 rows=48 width=41)
       Workers Planned: 6
       ->  Sort  (cost=390389.36..390389.38 rows=8 width=41)
             Sort Key: timestep
             ->  Parallel Seq Scan on mosmix_data  (cost=0.00..390389.24 rows=8 width=41)
                   Filter: ((name ~~ '%G385%'::text) AND (timestep >= now()) AND (timestep <= (now()   '4 days'::interval)))
     JIT:
       Functions: 4
       Options: Inlining false, Optimization false, Expressions true, Deforming true
    (9 rows)

Thank you!

UPDATE as requested by @Frank Heikens added EXPLAIN(ANALYZE, VERBOSE, BUFFERS)

EXPLAIN(ANALYZE, VERBOSE, BUFFERS) select name 
from mosmix_data 
where timestep > now() 
order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' 
limit 1;
                                                                         QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.41..2.00 rows=1 width=17) (actual time=1.549..1.550 rows=1 loops=1)
   Output: name, ((point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry))
   Buffers: shared hit=152
   ->  Index Scan using mosmix_data_point_idx on public.mosmix_data  (cost=0.41..2261401.29 rows=1427043 width=17) (actual time=1.547..1.547 rows=1 loops=1)
         Output: name, (point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
         Order By: (mosmix_data.point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
         Filter: (mosmix_data.timestep > now())
         Rows Removed by Filter: 3
         Buffers: shared hit=152
 Planning Time: 0.201 ms
 Execution Time: 1.585 ms
(11 rows)

AND

Gather Merge  (cost=391380.39..391503.32 rows=1008 width=41) (actual time=169.483..195.812 rows=96 loops=1)
   Output: timestep, name, rh02, tx, tn
   Workers Planned: 6
   Workers Launched: 6
   Buffers: shared hit=384456
   ->  Sort  (cost=390380.29..390380.71 rows=168 width=41) (actual time=112.120..112.121 rows=14 loops=7)
         Output: timestep, name, rh02, tx, tn
         Sort Key: mosmix_data.timestep
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=384456
         Worker 0:  actual time=82.637..82.638 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 2.527 ms, Inlining 0.000 ms, Optimization 1.551 ms, Emission 20.164 ms, Total 24.242 ms
           Buffers: shared hit=19794
         Worker 1:  actual time=112.582..112.586 rows=96 loops=1
           Sort Method: quicksort  Memory: 32kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.420 ms, Inlining 0.000 ms, Optimization 0.998 ms, Emission 11.794 ms, Total 14.211 ms
           Buffers: shared hit=40594
         Worker 2:  actual time=82.642..82.643 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 2.526 ms, Inlining 0.000 ms, Optimization 1.551 ms, Emission 20.164 ms, Total 24.242 ms
           Buffers: shared hit=19147
         Worker 3:  actual time=112.605..112.606 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.400 ms, Inlining 0.000 ms, Optimization 1.013 ms, Emission 11.582 ms, Total 13.995 ms
           Buffers: shared hit=40908
         Worker 4:  actual time=112.606..112.607 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.401 ms, Inlining 0.000 ms, Optimization 0.998 ms, Emission 11.395 ms, Total 13.794 ms
           Buffers: shared hit=42131
         Worker 5:  actual time=112.604..112.604 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.406 ms, Inlining 0.000 ms, Optimization 1.016 ms, Emission 11.835 ms, Total 14.258 ms
           Buffers: shared hit=41171
         ->  Parallel Seq Scan on public.mosmix_data  (cost=0.00..390374.08 rows=168 width=41) (actual time=103.656..112.018 rows=14 loops=7)
  Timing: Generation 1.420 ms, Inlining 0.000 ms, Optimization 0.998 ms, Emission 11.794 ms, Total 14.211 ms
           Buffers: shared hit=40594
         Worker 2:  actual time=82.642..82.643 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 2.526 ms, Inlining 0.000 ms, Optimization 1.551 ms, Emission 20.164 ms, Total 24.242 ms
           Buffers: shared hit=19147
         Worker 3:  actual time=112.605..112.606 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.400 ms, Inlining 0.000 ms, Optimization 1.013 ms, Emission 11.582 ms, Total 13.995 ms
           Buffers: shared hit=40908
         Worker 4:  actual time=112.606..112.607 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.401 ms, Inlining 0.000 ms, Optimization 0.998 ms, Emission 11.395 ms, Total 13.794 ms
           Buffers: shared hit=42131
         Worker 5:  actual time=112.604..112.604 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.406 ms, Inlining 0.000 ms, Optimization 1.016 ms, Emission 11.835 ms, Total 14.258 ms
           Buffers: shared hit=41171
         ->  Parallel Seq Scan on public.mosmix_data  (cost=0.00..390374.08 rows=168 width=41) (actual time=103.656..112.018 rows=14 loops=7)
               Output: timestep, name, rh02, tx, tn
               Filter: ((mosmix_data.name ~~ '%G385%'::text) AND (mosmix_data.timestep >= now()) AND (mosmix_data.timestep <= (now()   '4 days'::interval)))
               Rows Removed by Filter: 210995
               Buffers: shared hit=384234
               Worker 0:  actual time=82.484..82.484 rows=0 loops=1
                 Buffers: shared hit=19757
               Worker 1:  actual time=53.929..112.467 rows=96 loops=1
                 Buffers: shared hit=40557
               Worker 2:  actual time=82.492..82.492 rows=0 loops=1
                 Buffers: shared hit=19110
               Worker 3:  actual time=112.507..112.507 rows=0 loops=1
                 Buffers: shared hit=40871
               Worker 4:  actual time=112.510..112.511 rows=0 loops=1
                 Buffers: shared hit=42094
               Worker 5:  actual time=112.509..112.509 rows=0 loops=1
                 Buffers: shared hit=41134
 Planning Time: 0.204 ms
 JIT:
   Functions: 28
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 11.963 ms, Inlining 0.000 ms, Optimization 8.332 ms, Emission 97.045 ms, Total 117.341 ms
 Execution Time: 197.158 ms
(75 rows)

UPDATE

So i followed every advice and i think it actually made the query solower :)

So I added following indices:

CREATE INDEX items_day_of_creation_idx ON mosmix_data ( (created_at::date) );
CREATE INDEX name_idx_gin ON mosmix_data USING GIN (name gin_trgm_ops);

I also changed the data type of column name to text. Furthermore I changed the query to CTE:

WITH name AS (select name 
from mosmix_data 
where timestep > now() 
order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' 
limit 1) select timestep, name, Rh02, TX, TN 
from mosmix_data 
where name = name 
  and timestep between now() and now()   interval '4 days'  
order by timestep ASC;
     

I also installed pg_trgm

Here is the explainer:

mosmix=# EXPLAIN(ANALYZE, VERBOSE, BUFFERS) WITH name AS (select name 
from mosmix_data 
where timestep > now() 
order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' 
limit 1) select timestep, name, Rh02, TX, TN 
from mosmix_data 
where name = name 
  and timestep between now() and now()   interval '4 days'  
order by timestep ASC;


Gather Merge  (cost=209808.68..279068.37 rows=572750 width=39) (actual time=282.161..387.589 rows=573312 loops=1)
   Output: timestep, name, rh02, tx, tn
   Workers Planned: 5
   Workers Launched: 5
   Buffers: shared hit=190898, temp read=3520 written=3539
   ->  Sort  (cost=208808.60..209094.98 rows=114550 width=39) (actual time=233.210..246.441 rows=95552 loops=6)
         Output: timestep, name, rh02, tx, tn
         Sort Key: mosmix_data.timestep
         Sort Method: external merge  Disk: 5952kB
         Buffers: shared hit=190898, temp read=3520 written=3539
         Worker 0:  actual time=223.592..236.370 rows=90733 loops=1
           Sort Method: external merge  Disk: 4456kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.453 ms, Inlining 0.000 ms, Optimization 1.033 ms, Emission 11.837 ms, Total 14.323 ms
           Buffers: shared hit=29938, temp read=557 written=560
         Worker 1:  actual time=223.059..238.316 rows=88448 loops=1
           Sort Method: external merge  Disk: 4344kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.444 ms, Inlining 0.000 ms, Optimization 1.028 ms, Emission 11.926 ms, Total 14.398 ms
           Buffers: shared hit=29751, temp read=543 written=546
         Worker 2:  actual time=223.691..236.821 rows=91575 loops=1
           Sort Method: external merge  Disk: 4496kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.465 ms, Inlining 0.000 ms, Optimization 1.035 ms, Emission 11.841 ms, Total 14.341 ms
           Buffers: shared hit=30630, temp read=562 written=565
         Worker 3:  actual time=223.555..236.986 rows=90809 loops=1
           Sort Method: external merge  Disk: 4464kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.446 ms, Inlining 0.000 ms, Optimization 1.035 ms, Emission 11.837 ms, Total 14.318 ms
           Buffers: shared hit=30770, temp read=558 written=561
         Worker 4:  actual time=223.505..237.041 rows=90596 loops=1
           Sort Method: external merge  Disk: 4448kB
           JIT:
             Functions: 4
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 1.447 ms, Inlining 0.000 ms, Optimization 1.034 ms, Emission 11.897 ms, Total 14.378 ms
           Buffers: shared hit=30083, temp read=556 written=559
         ->  Parallel Seq Scan on public.mosmix_data  (cost=0.00..197348.43 rows=114550 width=39) (actual time=13.155..203.203 rows=95552 loops=6)
               Output: timestep, name, rh02, tx, tn
               Filter: ((mosmix_data.name IS NOT NULL) AND (mosmix_data.timestep >= now()) AND (mosmix_data.timestep <= (now()   '4 days'::interval)))
               Rows Removed by Filter: 150295
               Buffers: shared hit=190713
               Worker 0:  actual time=13.184..194.421 rows=90733 loops=1
                 Buffers: shared hit=29901
               Worker 1:  actual time=13.227..195.179 rows=88448 loops=1
                 Buffers: shared hit=29714
               Worker 2:  actual time=13.178..194.910 rows=91575 loops=1
                 Buffers: shared hit=30593
Planning Time: 0.307 ms
 JIT:
   Functions: 24
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 8.992 ms, Inlining 0.000 ms, Optimization 6.265 ms, Emission 70.878 ms, Total 86.135 ms
 Execution Time: 407.614 ms
(66 rows)

UPDATE

the new reason for slowdown is the CTE. Without it but with other measures the performance improved dramatically:

EXPLAIN(ANALYZE, VERBOSE, BUFFERS) select name from mosmix_data where timestep > now() order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' limit 1;
    QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..1.71 rows=1 width=15) (actual time=0.738..0.738 rows=1 loops=1)
   Output: name, ((point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry))
   Buffers: shared hit=136
   ->  Index Scan using mosmix_data_point_idx on public.mosmix_data  (cost=0.29..2070165.86 rows=1456434 width=15) (actual time=0.736..0.736 rows=1 loops=1)
         Output: name, (point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
         Order By: (mosmix_data.point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
         Filter: (mosmix_data.timestep > now())
         Rows Removed by Filter: 2
         Buffers: shared hit=136
 Planning Time: 0.142 ms
 Execution Time: 0.763 ms
(11 rows)

AND

mosmix=# EXPLAIN(ANALYZE, VERBOSE, BUFFERS) select timestep, name, Rh02, TX, TN 
from mosmix_data 
where name='G385' 
  and timestep between now() and now()   interval '4 days'  
order by timestep ASC;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=274.79..275.03 rows=97 width=39) (actual time=0.086..0.087 rows=0 loops=1)
   Output: timestep, name, rh02, tx, tn
   Sort Key: mosmix_data.timestep
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=3
   ->  Index Scan using name_idx on public.mosmix_data  (cost=0.43..271.59 rows=97 width=39) (actual time=0.080..0.080 rows=0 loops=1)
         Output: timestep, name, rh02, tx, tn
         Index Cond: (mosmix_data.name = 'G385'::text)
         Filter: ((mosmix_data.timestep >= now()) AND (mosmix_data.timestep <= (now()   '4 days'::interval)))
         Buffers: shared hit=3
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.455 ms
 Execution Time: 0.118 ms
(14 rows)

CodePudding user response:

The simple way is to create an index on timestamp then name:

CREATE INDEX my_idx ON mosmix_data(timestamp, name)

timestamp is a good candidate for the first column of the index because that's the tightest condition in your where clause (within a range of 4 days).

By putting name in the index too, the like condition, which will not use an index because the first character is unknown (it's %), can be executed on the value in the index rather than having to retrieve the row from the table.

  • Related