postgresql st_contains performance-CodePudding

SELECT
  a.geom, 'tk' category,
  ROUND(avg(tk), 1) tk
FROM
  tb_grid_4326_100m a left outer join 
(
  SELECT
    tk-273.15 tk, geom
  FROM
    tb_points
  WHERE
    hour = '23'
) b ON st_contains(a.geom, b.geom)
GROUP BY
  a.geom

QUERY PLAN                                                                                                                                                          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Finalize GroupAggregate  (cost=54632324.85..54648025.25 rows=50698 width=184) (actual time=8522.042..8665.129 rows=50698 loops=1)                                   |
  Group Key: a.geom                                                                                                                                                 |
  ->  Gather Merge  (cost=54632324.85..54646504.31 rows=101396 width=152) (actual time=8522.032..8598.567 rows=50698 loops=1)                                       |
        Workers Planned: 2                                                                                                                                          |
        Workers Launched: 2                                                                                                                                         |
        ->  Partial GroupAggregate  (cost=54631324.83..54633800.68 rows=50698 width=152) (actual time=8490.577..8512.725 rows=16899 loops=3)                        |
              Group Key: a.geom                                                                                                                                     |
              ->  Sort  (cost=54631324.83..54631785.36 rows=184212 width=130) (actual time=8490.557..8495.249 rows=16996 loops=3)                                   |
                    Sort Key: a.geom                                                                                                                                |
                    Sort Method: external merge  Disk: 2296kB                                                                                                       |
                    Worker 0:  Sort Method: external merge  Disk: 2304kB                                                                                            |
                    Worker 1:  Sort Method: external merge  Disk: 2296kB                                                                                            |
                    ->  Nested Loop Left Join  (cost=0.41..54602621.56 rows=184212 width=130) (actual time=1.729..8475.942 rows=16996 loops=3)                      |
                          ->  Parallel Seq Scan on tb_grid_4326_100m a  (cost=0.00..5866.24 rows=21124 width=120) (actual time=0.724..2.846 rows=16899 loops=3)     |
                          ->  Index Scan using sidx_tb_points on tb_points  (cost=0.41..2584.48 rows=10 width=42) (actual time=0.351..0.501 rows=1 loops=50698)|
                                Index Cond: (((hour)::text = '23'::text) AND (geom @ a.geom))                                                                       |
                                Filter: st_contains(a.geom, geom)                                                                                                   |
                                Rows Removed by Filter: 0                                                                                                           |
Planning Time: 1.372 ms                                                                                                                                             |
Execution Time: 8667.418 ms                                                                                                                                         |

I want to join 100m grid table, 100,000 points table using st_contains function. The 100m grid table has 75,769 records, and tb_points table has 2,434,536 records. When a time condition is given, the tb_points table returns about 100,000 records. (As a result, about 75,000 records JOIN about 100,000 records.)

(Index information) 100m grid table using gist(geom),

tb_points table using gist(hour, geom)

It took 30 seconds. How can i imporve the performance?

CodePudding user response：

It is hard to give a definitive answer, but here are several things you can try:

For a multicolumn gist index, it is often a good idea to put the most selectively used column first. In your case, that would have the index be on (geom, hour), not (hour, geom). On the other hand, it can also be better to put the faster column first, and testing for scalar equality should be much faster than testing for containment. You would have to do the test and see which factor is more important for you.

You could try for an Index-only scan, which doesn't need to visit the table. That could save a lot of random IO. Do do that you would need the index gist (hour, geom) INCLUDE (tk, geom). The geom column in a gist index is not considered to be "returnable", so it also needs to be put in the INCLUDE part into order to get the IOS.

Finally, you could partition the table tb_points on "hour". Then you wouldn't need to put "hour" into the gist index, as it is already fulfilled by the partitioning.

And these can be mixed and matched, so you could also swap the column order in the INCLUDE index, or you could try to get both partitioning and the INCLUDE index working together.