My Postgres database was migrated from MySQL using a tool and the code base has lot of syntax issues.
One of the issues is with the UPDATE statements inside procedures where the column name contains alias name as below.
UPDATE table1 t1 SET t1.col1 = 'some_value';
Having alias name after SET
keyword as in t1.col1
is a wrong syntax in Postgres.
As the number of procedures is huge, I'm trying to write a regular expression to find which procedures have this pattern.
select proname, prosrc from pg_proc
where regexp_replace(prosrc, E'[\\n\\r] ', ' ', 'g' ) ~* '[:UPDATE:]([[:space:]] )[:set:]([[:space:]] )^[a-z]([a-z]|[0-9]) \.^[a-z]([a-z]|[0-9]) ([[:space:]]*)[:=:]';
The regexp_replace
part on the left side of the condition is to remove line breaks which works fine. The main part on the right side is not returning the desired result.
I am trying to find the procedures that has UPDATE
keyword followed by one more space, followed by SET keyword, followed by one more space, followed by one more alphanumeric characters (which starts with an alphabet), followed by a dot(.) , followed by one more alphanumeric characters (which starts with an alphabet), followed by zero or more spaces, followed by an equal to sign (=).
But the statement I formed seems to be wrong. Any help on this is much appreciated.
CodePudding user response:
I think this may be more complex than you think... A procedure/function may have more than one update statement, and a simple regex will likely come up with many false positives.
I think you want a function to do a better job of eliminating false positives that result from:
- Alias that occurs after the update, in a separate statement (after the semicolon) -- fix by splitting statements by semicolons
- Aliases within the update that occur after a
FROM
orWHERE
clause, which are valid and not syntax errors - Less frequent, aliases used in a CTE prior to the update - fix by ignoring everything prior to the
update
keyword
Here is a boilerplate for what I think will get you close and minimize false positives:
create or replace function find_bad_syntax()
returns setof text
language plpgsql as
$BODY$
DECLARE
r pg_proc%rowtype;
dml varchar[];
eval varchar;
alias varchar;
BEGIN
FOR r IN
SELECT * FROM pg_proc WHERE prosrc ilike '%update%'
LOOP
dml := string_to_array (r.prosrc, ';');
foreach eval in array dml
loop
alias := substring (lower (eval), 'update [\w.] \s (\w )');
continue when alias is null or lower (alias) = 'set';
eval := regexp_replace (eval, 'from\s .*', '', 'i');
eval := regexp_replace (eval, 'where\s.*', '', 'i');
eval := regexp_replace (eval, '^.*update', '', 'i');
if eval ~* (alias || '\.\w \s =') then
return next format ('PROC: %s ALIAS: %s => ERROR: %s', r.proname, alias, eval);
end if;
end loop;
END LOOP;
END;
$BODY$
So to get the results simply:
select * from find_bad_syntax()
I did a test run, and your function did show up in the results.
CodePudding user response:
Yes, in PostgreSQL this is not working:
UPDATE table1 t1 SET t1.col1 = 'some_value';
But, this is working correctly:
UPDATE table1 t1 SET col1 = 'some_value';
So we only need to clear the update field alias. Example for do it:
with t1(txt) as (
select 'UPDATE table1 t1 SET t1.col1 = some_value'
)
select regexp_replace(t1.txt, 'SET (.*)\d\.', 'SET ', 'g') from t1
CodePudding user response:
For finding, selecting:
with t1(txt) as (
select 'UPDATE table1 t1 SET t1.col1 = some_value'
)
select * from t1 where t1.txt ~ 'SET (.*)\d\.'
CodePudding user response:
Some small changes:
with t1(txt) as (
select 'UPDATE table1 t1 SET t1.col1 = some_value'
union all
select 'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
union all
select 'select pp3.* from table1 pp3'
union all
select 'UPDATE table1 SET col1 = some_value'
union all
select 'UPDATE table1 t SET t.col1 = some_value'
)
select * from t1 where t1.txt ~ 'SET (.*)\w\.'
--Result:
'UPDATE table1 t1 SET t1.col1 = some_value'
'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
'UPDATE table1 t SET t.col1 = some_value'