I am looking to create a python function that will take a long SQL script that I have to create a table and place the session variables into the script so it can be used as a view within Snowflake. For example,
SET TABLE_NAME = MY_TABLE_NAME;
CREATE OR REPLACE VIEW MY_VIEW AS (
SELECT * FROM IDENTIFIER($TABLE_NAME)
)
With the python script, the previous block becomes
CREATE OR REPLACE VIEW MY_VIEW AS(
SELECT * FROM MY_TABLE
)
However, during testing, I realized that the if the variable name is within another function, the last parenthesis is captured and is removed. Is there a way that I can replace the string with the variable value only if it is wrapped in the identifier function?
I would like this code:
IDENTIFIER($VAR_NAME)
identifier($VAR_NAME)
SELECT * FROM $VAR_NAME
DATEADD('DAY', 1,$VAR_NAME)
To become:
VAR_NAME
VAR_NAME
SELECT * FROM VAR_NAME
DATEADD('DAY', 1,VAR_NAME)
This is what I have tried so far. https://regex101.com/r/2SriK9/2 Thanks.
P.S. In the last example, if var_name were a function, it would need to have the function and then close with a closing parenthesis: DATEADD('DAY', 1,MY_FUNC()) [Currently, my output makes it DATEADD('DAY', 1,MY_FUNC()] with no closing parenthesis on the dateadd function.
CodePudding user response:
There are two patterns you're looking for here: one enclosed in the identifier function, and the other with just a preceding $
character, so you can use an alternation pattern to search for both of them, capture the variable names of each, if any, and replace the match with what's captured.
Find (with the case-insensitive flag):
identifier\(\$(\w )\)|\$(\w )
Replace with:
\1\2
Demo: https://regex101.com/r/2SriK9/3
CodePudding user response:
Instead of using regular expressions to perform complex replacements, you can use a library such as sqlglot
to parse the query into an AST, which you can then update to produce your desired query:
import sqlglot
def run_updates(ast, scope):
if isinstance(ast, list):
#remove any `SET` queries:
return [run_updates(i, scope) for i in ast if \
not isinstance(i, sqlglot.expressions.Command) or str(i.this).lower() != 'set']
if isinstance(ast, sqlglot.expressions.Create):
if isinstance(ast.this, (sqlglot.expressions.Func, sqlglot.expressions.UserDefinedFunction)):
#node is a function, save the function name
scope['functions'].append(ast.this.this.this)
if isinstance(ast, sqlglot.expressions.Anonymous):
if ast.this.lower() == 'identifier':
#remove `IDENTIFER($VALUE)` expressions
return sqlglot.expressions.Identifier(this = ast.args['expressions'][0].this.this[1:], quoted=False)
if isinstance(ast, sqlglot.expressions.Identifier):
if isinstance(ast.this, str) and ast.this[0] == '$':
#replace general `$session_variable` expressions
if ast.this[1:] in scope['functions']:
#if the session variable is function, update accordingly
return sqlglot.expressions.Anonymous(this = ast.this[1:])
return sqlglot.expressions.Identifier(this = ast.this[1:], quoted=False)
if hasattr(ast, 'args'):
ast.args = {a:run_updates(b, scope) for a, b in ast.args.items()}
return ast
def replace_identifiers(query):
ast, scope = sqlglot.parse(query), {'variables':{}, 'functions':[]}
return ';\n'.join(i.sql() for i in run_updates(ast, scope))
Tests:
#replacing IDENFITIER
s = '''
SET TABLE_NAME = MY_TABLE_NAME;
CREATE OR REPLACE VIEW MY_VIEW AS (
SELECT * FROM IDENTIFIER($TABLE_NAME)
)
'''
print(replace_identifiers(s))
#replacing IDENTIFIER and session variables
s1 = '''
IDENTIFIER($VAR_NAME);
identifier($VAR_NAME);
SELECT * FROM $VAR_NAME;
DATEADD('DAY',1,$VAR_NAME);
'''
print(replace_identifiers(s1))
#replacing session variable as a function call, since the test name is declared as a function in a prior line
s2 = '''
CREATE FUNCTION VAR_NAME() RETURNS TIMESTAMP AS 'SELECT now()';
DATEADD('DAY',1,$VAR_NAME);
'''
print(replace_identifiers(s2))
Output:
CREATE OR REPLACE VIEW MY_VIEW AS (SELECT * FROM TABLE_NAME)
VAR_NAME;
VAR_NAME;
SELECT * FROM VAR_NAME;
DATEADD('DAY', 1, VAR_NAME)
CREATE FUNCTION VAR_NAME() RETURNS TIMESTAMP AS 'SELECT now()';
DATEADD('DAY', 1, VAR_NAME())