Home > Enterprise >  Extract column values from one table and insert with modifications into another
Extract column values from one table and insert with modifications into another

Time:03-09

I have created a PL/pgSQL function that accepts two column names, a "relation", and two table names. It finds distinct rows in one table and inserts them in to a temporary table, deletes any row with a null value, and sets all values of one column to relation. I have the first part of the process using this function.

create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
    returns void
    language plpgsql as
$func$
begin
    raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
    execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
    execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null'); 
end
$func$;

It is executed as follows:

-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
 
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );

select * from temp_stack;

Note that I didn't use relation, yet. The INSERT shall also assign relation, but I can't figure out how to make that happen to get something like:

label source target attr graph
:associated_with 638000 ARAS
:associated_with 202000 JASE
:associated_with 638010 JASE
:associated_with 638000 JASE
:associated_with 202100 JASE
:documents A 638010
:documents A 202000
:documents A 202100
:documents B 638000
:documents A 638000
:documents B 124004
:documents B 202100

My challenges are:

  • How to integrate relation in the INSERT? When I try to use VALUES and comma separation I get an "error near select".
  • How to allow strings starting with ":" in relation? I'm anticipating here, the inclusion of the colon has given me challenges in the past.

How can I do this? Or is there a better approach?

Toy data model:

drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;

CodePudding user response:

-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sql text := format(
       'INSERT INTO pg_temp.%3$I (label, source, target)
        SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
        WHERE (%1$I, %2$I) IS NOT NULL'
      , _s, _v, _tbl, _tbl_src);
BEGIN
   -- RAISE NOTICE '%', _sql;  -- debug
   EXECUTE _sql USING _relation;
   GET DIAGNOSTICS row_count = ROW_COUNT;  -- return number of inserted rows
END
$func$;

db<>fiddle here

Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.

We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.) See:

I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.

(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL. About that:

Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.

I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:

I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See:

  • Related