Home > Enterprise >  Is it possible to accelerate this postgress / postgis statement?
Is it possible to accelerate this postgress / postgis statement?

Time:01-09

I have a Postgres database with a postgis extention installed and filles with open street map data.

With the following SQL statement :

SELECT                                                                
    l.osm_id,
    sum(
        st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
        /
        st_area(ST_Buffer(l.way, 30))
    ) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id;

I calculate a "green" score.

My goal is to create a "green" score for each osm_id.

Which means; how much of a road is near a water, forrest or something similar.

To do so:

I Create a 30 meter buffer around each way, and calculate the intersection between that buffered way and any green features nearby.

I am using 'green features' to refer to polygons in OpenStreetMap's database, such as a park.

Is it possbile to accelerate this calculation?

One thing I id is to create 2 indices in a hope to accelerate the calculation:

CREATE INDEX way_index_2 on planet_osm_polygon USING gist(way) WHERE "natural" IN ('water','wood','forest','hill','valley');
CREATE INDEX way_index_3 on planet_osm_polygon USING gist(way) WHERE "landuse" IN ('forest');

Here is an "explain" to this statement:

EXPLAIN (ANALYZE, BUFFERS) SELECT                                                                
    l.osm_id,
    sum(
        st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
        /
        st_area(ST_Buffer(l.way, 30))
    ) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id limit 1;
                                                                                       QUERY PLAN                                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=17816.83..133789235.22 rows=1 width=16) (actual time=1575643.737..1575651.862 rows=1 loops=1)
   Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
   ->  GroupAggregate  (cost=17816.83..1435958589062981.00 rows=10734420 width=16) (actual time=1575643.723..1575651.847 rows=1 loops=1)
         Group Key: l.osm_id
         Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
         ->  Nested Loop  (cost=17816.83..1433802261939271.50 rows=28652652777 width=448) (actual time=978502.788..1575648.857 rows=8 loops=1)
               Join Filter: st_intersects(l.way, st_buffer(p.way, '30'::double precision, ''::text))
               Rows Removed by Join Filter: 6528525
               Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
               ->  Index Scan using osm_id_idx on planet_osm_line l  (cost=0.44..1242021.57 rows=22671610 width=247) (actual time=5.963..6.181 rows=6 loops=1)
                     Buffers: shared hit=5 read=3
               ->  Materialize  (cost=17816.39..1445364.98 rows=1263812 width=201) (actual time=85.181..4605.942 rows=1088089 loops=6)
                     Buffers: shared hit=380236 read=220348, temp read=313661 written=73938
                     ->  Gather  (cost=17816.39..1403253.92 rows=1263812 width=201) (actual time=510.609..1066.182 rows=1250378 loops=1)
                           Workers Planned: 4
                           Workers Launched: 4
                           Buffers: shared hit=380236 read=220348
                           ->  Parallel Bitmap Heap Scan on planet_osm_polygon p  (cost=16816.39..1275872.72 rows=315953 width=201) (actual time=447.168..9410.838 rows=250076 loops=5)
                                 Recheck Cond: (("natural" = ANY ('{water,wood,forest,hill,valley}'::text[])) OR (landuse = 'forest'::text))
                                 Rows Removed by Index Recheck: 2554266
                                 Filter: (("natural" = 'water'::text) OR (landuse = 'forest'::text))
                                 Rows Removed by Filter: 53217
                                 Heap Blocks: lossy=1
                                 Buffers: shared hit=380236 read=220348
                                 ->  BitmapOr  (cost=16816.39..16816.39 rows=1554297 width=0) (actual time=491.891..491.893 rows=0 loops=1)
                                       Buffers: shared hit=7797
                                       ->  Bitmap Index Scan on way_index_2  (cost=0.00..7822.79 rows=750359 width=0) (actual time=413.690..413.690 rows=737741 loops=1)
                                             Buffers: shared hit=3758
                                       ->  Bitmap Index Scan on way_index_3  (cost=0.00..8361.69 rows=803938 width=0) (actual time=78.198..78.198 rows=783702 loops=1)
                                             Buffers: shared hit=4039
 Planning Time: 0.315 ms
 Execution Time: 1575673.609 ms
(32 rows)

UPDATE:

Here is a short schema of the both tables;

                                                Table 

"public.planet_osm_line"
       Column       |           Type            | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-------------------- --------------------------- ----------- ---------- --------- ---------- ------------- -------------- -------------
 osm_id             | bigint                    |           |          |         | plain    |             |              | 
 access             | text                      |           |          |         | extended |             |              | 
 addr:housename     | text                      |           |          |         | extended |             |              | 
 addr:housenumber   | text                      |           |          |         | extended |             |              | 
 addr:interpolation | text                      |           |          |         | extended |             |              | 
 admin_level        | text                      |           |          |         | extended |             |              | 
....
natural            | text                      |           |          |         | extended |             |              | 



Indexes:
    "highway_idx" btree (highway)
    "motorway_idx" gist (way) WHERE highway = 'motorway'::text
    "motorway_trunk_primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['motorway'::text, 'trunk'::text, 'primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
    "name_idx" btree (name)
    "osm_id_idx" btree (osm_id)
    "planet_osm_line_osm_id_idx" btree (osm_id)
    "planet_osm_line_way_idx" gist (way)
    "primary_idx" gist (way) WHERE highway = 'primary'::text
    "primary_secondary_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text])
    "primary_secondary_tertiary_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text, 'tertiary'::text])
    "primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
    "secondary_idx" gist (way) WHERE highway = 'secondary'::text
    "secondary_tertiary_idx" gist (way) WHERE highway = ANY (ARRAY['secondary'::text, 'tertiary'::text])
    "tertiary_idx" gist (way) WHERE highway = 'tertiary'::text
    "tertiary_secondary_idx" gist (way) WHERE highway = ANY (ARRAY['tertiary'::text, 'unclassified'::text])
    "trunk_idx" gist (way) WHERE highway = 'trunk'::text
    "trunk_primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['trunk'::text, 'primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
    "unclassified_idx" gist (way) WHERE highway = 'unclassified'::text
    "way_idx" gist (way)
    "way_index_1" gist (way)
    "way_index_4" gist (way) WHERE "natural" = ANY (ARRAY['water'::text, 'wood'::text, 'forest'::text, 'hill'::text, 'valley'::text])
    "way_index_5" gist (way) WHERE landuse = 'forest'::text
Triggers:
    planet_osm_line_osm2pgsql_valid BEFORE INSERT OR UPDATE ON planet_osm_line FOR EACH ROW EXECUTE FUNCTION planet_osm_line_osm2pgsql_valid()
Access method: heap

And:

                                                 Table "public.planet_osm_polygon"
       Column       |          Type           | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-------------------- ------------------------- ----------- ---------- --------- ---------- ------------- -------------- -------------
 osm_id             | bigint                  |           |          |         | plain    |             |              | 
 access             | text                    |           |          |         | extended |             |              | 
 addr:housename     | text                    |           |          |         | extended |             |              | 
 addr:housenumber   | text                    |           |          |         | extended |             |              | 
 addr:interpolation | text                    |           |          |         | extended |             |              | 
 admin_level        | text                    |           |          |         | extended |             |              | 
 aerialway          | text                    |           |          |         | extended |             |              | 
 aeroway            | text                    |           |          |         | extended |             |              | 
 amenity            | text                    |           |          |         | extended |             |              | 
 area               | text                    |           |          |         | extended |             |              | 
 barrier            | text                    |           |          |         | extended |             |              | 
landuse            | text                    |           |          |         | extended |             |              | 


Indexes:
    "fuel_toilet_parking_restaurant_idex" gist (way) WHERE amenity = ANY (ARRAY['fuel'::text, 'toilets'::text, 'parking'::text, 'restaurant'::text, 'cafe'::text, 'pub'::text, 'ice_cream'::text, 'biergarten'::text])
    "planet_osm_polygon_osm_id_idx" btree (osm_id)
    "planet_osm_polygon_way_idx" gist (way)
    "viewpoint_attraction_guest_house_idex" gist (way) WHERE tourism = ANY (ARRAY['viewpoint '::text, 'attraction'::text, 'guest_house'::text])
    "way_index_2" gist (way) WHERE "natural" = ANY (ARRAY['water'::text, 'wood'::text, 'forest'::text, 'hill'::text, 'valley'::text])
    "way_index_3" gist (way) WHERE landuse = 'forest'::text
Triggers:
    planet_osm_polygon_osm2pgsql_valid BEFORE INSERT OR UPDATE ON planet_osm_polygon FOR EACH ROW EXECUTE FUNCTION planet_osm_polygon_osm2pgsql_valid()
Access method: heap

UPDATE:

planet_osm_polygon

wood               | text                    |           |          |                                                     | extended |             |              | 
 z_order            | integer                 |           |          |                                                     | plain    |             |              | 
 way_area           | real                    |           |          |                                                     | plain    |             |              | 
 way                | geometry(Geometry,3857) |           |          |                                                     | main     |             |              | 
 way_buffer_30      | geometry(Polygon)       |           |          |                                                     | external |             |              | 
 way_buffer_30_area | numeric                 |           |          | generated always as (st_area(way_buffer_30)) stored | main     |             |              | 

AND

planet_osm_line

way_area           | real                      |           |          |         | plain    |             |              | 
 way                | geometry(LineString,3857) |           |          |         | main     |             |              | 

CodePudding user response:

Not really an answer as much as a bunch of comments too long for a comment format:

alter system set default_toast_compression=lz4;

alter table planet_osm_polygon 
    add column way_buffer_30 geometry, 
    alter column way_buffer_30 set storage external,
    add column way_buffer_30_area numeric 
        generated always as (st_area(way_buffer_30)) stored;
update planet_osm_polygon 
    set way_buffer_30=st_buffer(way,30,'quad_segs=1');
create index osmp_way_buffer_30_gix on planet_osm_polygon 
    using gist(way_buffer_30) with (fillfactor=100);
cluster verbose planet_osm_polygon using osmp_way_buffer_30_gix;

alter table planet_osm_line
    add column way_buffer_30 geometry, 
    alter column way_buffer_30 set storage external,
    add column way_buffer_30_area numeric 
        generated always as (st_area(way_buffer_30)) stored; 
update planet_osm_line
    set way_buffer_30=st_buffer(way,30,'quad_segs=1');
create index osml_way_buffer_30_gix on planet_osm_line
    using gist(way_buffer_30) with (fillfactor=100);
cluster verbose planet_osm_polygon using osml_way_buffer_30_gix;

SELECT                                                                
    l.osm_id,
    sum(st_area(st_intersection(l.way_buffer_30, p.way))
        / l.way_buffer_30_area
    ) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, p.way_buffer_30)
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id;
  1. default_toast_compression=lz4 You're working with polygons, which are likely to be compressed and TOASTed. Default default_toast_compression=pglz is typically slower than lz4. Note that you need to force a re-write of those tables/columns after altering the setting, otherwise it'll be apply as a default only from that point onwards, not affecting anything retroactively.
  2. storage external skips one step in retrieving the data if the shapes are big/complex enough. Might be worth it depending on your PostGIS version.
  3. st_buffer(way,30,'quad_segs=1') you can tweak the third parameter to get a simpler shape, that's easier to compare against. Default quad_segs=8 can cause your buffer to have 8x more vertices than the input.
  4. with (fillfactor=100) I assume it's a static source table that you replace when a new version is published, rather than maintain the current one yourself. Therefore, indexes can be made static, too (default fillfactor=90 is meant to account for new rows inserted into the table).
  5. cluster aligns table pages with the index, making heap fetches faster.
  6. An index on the final buffer shape you want to use will be better than an index on a column you plan to use to create it using ST_Buffer(). Same goes for ST_Transform() - apply it on the column before indexing, not in the query.
  7. ST_Subdivide all the Things.
  8. You can test if ST_DWithin() speeds up the join compared to your current code and the one suggested above. Be sure to test with a suitable index. Keep in mind the unit of distance depends on the column SRID - if you're using metric you'll get anomalies from systems in imperial/nautical/degrees.
    INNER JOIN planet_osm_polygon AS p ON ST_DWithin(l.way,p.way,30)
    
  9. You can check if your partial index is at all worth it, and see if it performs better if narrow it down for the purpose of this query
    create index way_buffer_30_gix on planet_osm_polygon 
    using gist(way_buffer_30) with (fillfactor=100)
    where natural='water' or landuse='forest';
    
  10. Most of the above only makes sense if that's a one-off operation or if you can copy the planet_osm_polygon table so that you don't have to deal with other indexes and settings that are supposed to improve performance of operations that you do on that table more often.
  11. If it's really slow, extract a subset of rows from both tables and run your tests on those. To get more reliable results, use pgbench to run your benchmarks.
  • Related