Home > Enterprise >  Dynamic query that uses CTE gets "syntax error at end of input"
Dynamic query that uses CTE gets "syntax error at end of input"

Time:09-28

I have a table that looks like this:

CREATE TABLE label (
  hid  UUID PRIMARY KEY DEFAULT UUID_GENERATE_V4(),
  name TEXT NOT NULL UNIQUE
);

I want to create a function that takes a list of names and inserts multiple rows into the table, ignoring duplicate names, and returns an array of the IDs generated for the rows it inserted.
This works:

CREATE OR REPLACE FUNCTION insert_label(nms TEXT[])
  RETURNS UUID[]
AS $$
DECLARE 
  ids UUID[];
BEGIN
  CREATE TEMP TABLE tmp_names(name TEXT);
  INSERT INTO tmp_names SELECT UNNEST(nms);
  
  WITH new_names AS (
    INSERT INTO label(name)
    SELECT tn.name
    FROM tmp_names tn
    WHERE NOT EXISTS(SELECT 1 FROM label h WHERE h.name = tn.name)
    RETURNING hid
    )
  SELECT ARRAY_AGG(hid) INTO ids
  FROM new_names;

  DROP TABLE tmp_names;
    
  RETURN ids;
END;
$$ LANGUAGE PLPGSQL;

I have many tables with the exact same columns as the label table, so I would like to have a function that can insert into any of them. I'd like to create a dynamic query to do that. I tried that, but this does not work:

CREATE OR REPLACE FUNCTION insert_label(h_tbl REGCLASS, nms TEXT[])
  RETURNS UUID[]
AS $$
DECLARE 
    ids UUID[];
    query_str TEXT;
BEGIN
  CREATE TEMP TABLE tmp_names(name TEXT);
  INSERT INTO tmp_names SELECT UNNEST(nms);
 
   query_str := FORMAT('WITH new_names AS ( INSERT INTO %1$I(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM %1$I h WHERE h.name = tn.name) RETURNING hid)', h_tbl); 
  
  EXECUTE query_str;
    
  SELECT ARRAY_AGG(hid) INTO ids FROM new_names;
  DROP TABLE tmp_names;
    
  RETURN ids;
END;
$$ LANGUAGE PLPGSQL;

This is the output I get when I run that function:

psql=# select insert_label('label', array['how', 'now', 'brown', 'cow']);
ERROR:  syntax error at end of input
LINE 1: ...SELECT 1 FROM label h WHERE h.name = tn.name) RETURNING hid)
                                                                       ^
QUERY:  WITH new_names AS ( INSERT INTO label(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM label h WHERE h.name = tn.name) RETURNING hid)
CONTEXT:  PL/pgSQL function insert_label(regclass,text[]) line 19 at EXECUTE

The query generated by the dynamic SQL looks like it should be exactly the same as the query from static SQL.

I got the function to work by changing the return value from an array of UUIDs to a table of UUIDs and not using CTE:

CREATE OR REPLACE FUNCTION insert_label(h_tbl REGCLASS, nms TEXT[])
  RETURNS TABLE (hid UUID)
AS $$
DECLARE 
  query_str TEXT;
BEGIN
  CREATE TEMP TABLE tmp_names(name TEXT);
  INSERT INTO tmp_names SELECT UNNEST(nms);

  query_str := FORMAT('INSERT INTO %1$I(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM %1$I h WHERE h.name = tn.name) RETURNING hid', h_tbl);
  
  RETURN QUERY EXECUTE query_str;
  DROP TABLE tmp_names;
  
  RETURN;
END;
$$ LANGUAGE PLPGSQL;

I don't know if one way is better than the other, returning an array of UUIDs or a table of UUIDs, but at least I got it to work one of those ways. Plus, possibly not using a CTE is more efficient, so it may be better to stick with the version that returns a table of UUIDs.

What I would like to know is why the dynamic query did not work when using a CTE. The query it produced looked like it should have worked.
If anyone can let me know what I did wrong, I would appreciate it.

CodePudding user response:

... why the dynamic query did not work when using a CTE. The query it produced looked like it should have worked.

No, it was only the CTE without (required) outer query. (You had SELECT ARRAY_AGG(hid) INTO ids FROM new_names in the static version.)

There are more problems, but just use this query instead:

INSERT INTO label(name)
SELECT unnest(nms)
ON     CONFLICT DO NOTHING
RETURNING hid;

label.name is defined UNIQUE NOT NULL, so this simple UPSERT can replace your function insert_label() completely.

It's much simpler and faster. It also defends against possible duplicates from within your input array that you didn't cover, yet. And it's safe under concurrent write load - as opposed to your original, which might run into race conditions. Related:

I would just use the simple query and replace the table name.
But if you still want a dynamic function:

CREATE OR REPLACE FUNCTION insert_label(_tbl regclass, _nms text[])
  RETURNS TABLE (hid uuid)
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY EXECUTE format(
   $$
   INSERT INTO %s(name)
   SELECT unnest($1)
   ON     CONFLICT DO NOTHING
   RETURNING hid
   $$, _tbl)
   USING _nms;
END
$func$;

If you don't need an array as result, stick with the set (RETURNS TABLE ...). Simpler.

Pass values (_nms) to EXECUTE in a USING clause.

The tablename (_tbl) is type regclass, so the format specifier %I for format() would be wrong. Use %s instead. See:

  • Related