I have created a PL/pgSQL function that accepts two column names, a "relation", and two table names. It finds distinct rows in one table and inserts them in to a temporary table, deletes any row with a null value, and sets all values of one column to relation
. I have the first part of the process using this function.
create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
returns void
language plpgsql as
$func$
begin
raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null');
end
$func$;
It is executed as follows:
-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );
select * from temp_stack;
Note that I didn't use relation
, yet. The INSERT
shall also assign relation
, but I can't figure out how to make that happen to get something like:
label | source | target | attr | graph |
---|---|---|---|---|
:associated_with | 638000 | ARAS | ||
:associated_with | 202000 | JASE | ||
:associated_with | 638010 | JASE | ||
:associated_with | 638000 | JASE | ||
:associated_with | 202100 | JASE | ||
:documents | A | 638010 | ||
:documents | A | 202000 | ||
:documents | A | 202100 | ||
:documents | B | 638000 | ||
:documents | A | 638000 | ||
:documents | B | 124004 | ||
:documents | B | 202100 |
My challenges are:
- How to integrate
relation
in theINSERT
? When I try to useVALUES
and comma separation I get an "error near select". - How to allow strings starting with ":" in
relation
? I'm anticipating here, the inclusion of the colon has given me challenges in the past.
How can I do this? Or is there a better approach?
Toy data model:
drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;
CodePudding user response:
-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text := format(
'INSERT INTO pg_temp.%3$I (label, source, target)
SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
WHERE (%1$I, %2$I) IS NOT NULL'
, _s, _v, _tbl, _tbl_src);
BEGIN
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _relation;
GET DIAGNOSTICS row_count = ROW_COUNT; -- return number of inserted rows
END
$func$;
db<>fiddle here
Most importantly, use format()
to concatenate your dynamic SQL commands safely. And use the format specifier %I
for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number
. That's where your original failed.
We could concatenate _relation
as string to be inserted into label
, too. But the preferable way to pass values to EXECUTE
is with the USING
clause. $1
inside the SQL string passed to EXECUTE
is a placeholder for the first USING
argument. Not to be confused with $1
referencing function parameters in the context of the function body outside EXECUTE
! (You can pass any string, leading colon (:
) does not matter, the string is not interpreted when done right.)
See:
- Format specifier for integer variables in format() for EXECUTE?
- Table name as a PostgreSQL function parameter
I replaced the DELETE
in your original with a WHERE
clause to the SELECT
of the INSERT
. Don't insert rows in the first place, instead of deleting them again later.
(%1$I, %2$I) IS NOT NULL
only qualifies when both values are NOT NULL
.
About that:
Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.
I schema-qualify known temporary tables with pg_temp.
That's typically optional as the temporary schema comes first in the search_path
by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path
. So better safe than sorry. See:
I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT
parameter, I am allowed to skip the RETURNS
clause. See: