PostgreSQL: running SQL statements in parallel in a SQL script-CodePudding

I use a for loop in a script to compute geometries according to values in a series, and insert the results into a table (t1). Is there a way to rewrite the loop below using parallelism? What approach to use? I found many docs on how PostgreSQL runs queries in parallel, but hardy anything on how I could perform such task in my case. Is it even possible using SQL? or should I rely on other languages like Python, and execute the SQL queries from there?

DO $$
DECLARE
  lims int[];
  lim int;
BEGIN
  SELECT ARRAY(SELECT generate_series(60, 300, 60)) INTO lims;
  CREATE TABLE t1(
    lim integer,
    the_geom geometry(polygon, 4326),
  );
  FOREACH lim IN ARRAY lims
  LOOP
    WITH conv_hull AS (
      SELECT ST_ConcaveHull(...I use lim here...) AS conv_hull
    )
    INSERT INTO t1 (lim, the_geom) SELECT lim, ST_SetSRID(conv_hull.ST_ConcaveHull, 4326) FROM conv_hull;
  END LOOP;
END $$;

CodePudding user response：

Parallelism in Postgres is something that the query planner does for you to process big, qualifying SQL statements. You cannot launch parallel operations on demand in PL/pgSQL.

You could rewrite your example to replace the PL/pgSQL loop with a set-based solution in plain SQL, like:

BEGIN;

CREATE TABLE t1(
  lim integer
, the_geom geometry(polygon, 4326)
);

WITH conv_hull AS (
   SELECT ST_ConcaveHull(...) AS conv_hull
   )
INSERT INTO t1 (lim, the_geom)
SELECT generate_series(1,10), ST_SetSRID(conv_hull.ST_ConcaveHull, 4326)
FROM   conv_hull;

COMMIT;

If your server is configured properly, the query gets big enough, and all involved operations allow parallelism, then the planner will schedule multiple worker processes - and actually fire them up if resources are available.

generate_series() is PARALLEL SAFE, for starters. Read details in the manual chapter "Parallel Query".

CodePudding user response：

Alright, thanks! I got it working with the full statement:

SELECT ST_ConcaveHull(the_geom, 0.95) FROM (
  SELECT ST_Union(b.the_geom) AS the_geom
  FROM generate_series(60, 300, 60) AS lim, pgr_drivingDistance(
    'SELECT id, source, target, cost, reverse_cost FROM edge_net', vid, lim, true
  ) a, nodes b WHERE a.node = b.vid
  GROUP BY lim
) AS conv_hull

Same result as the for loop now. However, I don't get a significant speed improvement for the computation which was the whole point for me to find another solution. Probably, parallel processing is not trigged for some reason (although it's configured in the PG conf file)...