Home > Enterprise >  Postgres regular expression to find procedures with wrong Update syntax
Postgres regular expression to find procedures with wrong Update syntax

Time:09-14

My Postgres database was migrated from MySQL using a tool and the code base has lot of syntax issues.

One of the issues is with the UPDATE statements inside procedures where the column name contains alias name as below.

UPDATE table1 t1 SET t1.col1 = 'some_value';

Having alias name after SET keyword as in t1.col1 is a wrong syntax in Postgres.

As the number of procedures is huge, I'm trying to write a regular expression to find which procedures have this pattern.

select proname, prosrc from pg_proc 
where regexp_replace(prosrc, E'[\\n\\r] ', ' ', 'g' ) ~* '[:UPDATE:]([[:space:]] )[:set:]([[:space:]] )^[a-z]([a-z]|[0-9]) \.^[a-z]([a-z]|[0-9]) ([[:space:]]*)[:=:]';

The regexp_replace part on the left side of the condition is to remove line breaks which works fine. The main part on the right side is not returning the desired result.

I am trying to find the procedures that has UPDATE keyword followed by one more space, followed by SET keyword, followed by one more space, followed by one more alphanumeric characters (which starts with an alphabet), followed by a dot(.) , followed by one more alphanumeric characters (which starts with an alphabet), followed by zero or more spaces, followed by an equal to sign (=).

But the statement I formed seems to be wrong. Any help on this is much appreciated.

CodePudding user response:

I think this may be more complex than you think... A procedure/function may have more than one update statement, and a simple regex will likely come up with many false positives.

I think you want a function to do a better job of eliminating false positives that result from:

  • Alias that occurs after the update, in a separate statement (after the semicolon) -- fix by splitting statements by semicolons
  • Aliases within the update that occur after a FROM or WHERE clause, which are valid and not syntax errors
  • Less frequent, aliases used in a CTE prior to the update - fix by ignoring everything prior to the update keyword

Here is a boilerplate for what I think will get you close and minimize false positives:

create or replace function find_bad_syntax()
returns setof text
language plpgsql as
$BODY$
  DECLARE
    r pg_proc%rowtype;
    dml varchar[];
    eval varchar;
    alias varchar;
  BEGIN
    FOR r IN
      SELECT * FROM pg_proc WHERE prosrc ilike '%update%'
    LOOP
      dml := string_to_array (r.prosrc, ';');
      foreach eval in array dml
      loop
        alias := substring (lower (eval), 'update [\w.] \s (\w )');
      
        continue when alias is null or lower (alias) = 'set';
      
        eval := regexp_replace (eval, 'from\s .*', '', 'i');
        eval := regexp_replace (eval, 'where\s.*', '', 'i');
        eval := regexp_replace (eval, '^.*update', '', 'i');
        if eval ~* (alias || '\.\w \s =') then
          return next format ('PROC: %s ALIAS: %s => ERROR: %s', r.proname, alias, eval);
        end if;
      end loop;
    END LOOP;
  END;
$BODY$

So to get the results simply:

select * from find_bad_syntax()

I did a test run, and your function did show up in the results.

CodePudding user response:

Yes, in PostgreSQL this is not working:

UPDATE table1 t1 SET t1.col1 = 'some_value';

But, this is working correctly:

UPDATE table1 t1 SET col1 = 'some_value';

So we only need to clear the update field alias. Example for do it:

with t1(txt) as (
    select 'UPDATE table1 t1 SET t1.col1 = some_value'
)
select regexp_replace(t1.txt, 'SET (.*)\d\.', 'SET ', 'g') from t1 

CodePudding user response:

For finding, selecting:

with t1(txt) as (
    select 'UPDATE table1 t1 SET t1.col1 = some_value'
)
select * from t1 where t1.txt ~ 'SET (.*)\d\.'

CodePudding user response:

Some small changes:

with t1(txt) as (
    select 'UPDATE table1 t1 SET t1.col1 = some_value'
    union all 
    select 'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
    union all 
    select 'select pp3.* from table1 pp3'
    union all 
    select 'UPDATE table1 SET col1 = some_value'
    union all 
    select 'UPDATE table1 t SET t.col1 = some_value'
)
select * from t1 where t1.txt ~ 'SET (.*)\w\.'


--Result: 
'UPDATE table1 t1 SET t1.col1 = some_value'
'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
'UPDATE table1 t SET t.col1 = some_value'
  • Related