I have a table with strings in one column, which are actually storing other SQL Queries written before and stored to be ran at later times. They contain parameters such as '@organisationId' or '@enterDateHere'. I want to be able to extract these.
Example:
ID | Query |
---|---|
1 | SELECT * FROM table WHERE id = @organisationId |
2 | SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere |
3 | SELECT name '@' domain FROM user |
I want the following:
ID | Parameters |
---|---|
1 | @organisationId |
2 | @startDate, @endDate, @enterOrgHere |
3 | NULL |
No need to worry about how to separate/list them, as long as they are clearly visible and as long as the query lists all of them, which I don't know the number of. Please note that sometimes the queries contain just @ for example when email binding is being done, but it's not a parameter. I want just strings which start with @ and have at least one letter after it, ending with a non-letter character (space, enter, comma, semi-colon). If this causes problems, then return all strings starting with @ and I will simply identify the parameters manually.
It can include usage of Excel/Python/C# if needed, but SQL is preferable.
CodePudding user response:
The official way to interrogate the parameters is with sp_describe_undeclared_parameters, eg
exec sp_describe_undeclared_parameters @tsql = N'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'
CodePudding user response:
It is very simple to implement by using tokenization via XML and XQuery.
Notable points:
- 1st
CROSS APPLY
is tokenazing Query column as XML. - 2nd
CROSS APPLY
is filtering out tokens that don't have "@" symbol.
SQL #1
-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, Query VARCHAR(2048));
INSERT INTO @tbl (Query) VALUES
('SELECT * FROM table WHERE id = @organisationId'),
('SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'),
('SELECT name ''@'' domain FROM user');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.ID
, Parameters = IIF(t2.Par LIKE '@[a-z]%', t2.Par, NULL)
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['
REPLACE(Query, @separator, ']]></r><r><![CDATA[')
']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT TRIM('><=' FROM c.query('data(/root/r[contains(text()[1],"@")])').value('text()[1]','VARCHAR(1024)'))) AS t2(Par)
SQL #2
A cleansing step was added to handle other than a regular space whitespaces first.
SELECT t.*
, Parameters = IIF(t2.Par LIKE '@[a-z]%', t2.Par, NULL)
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA[' Query ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t0(pure)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['
REPLACE(Pure, @separator, ']]></r><r><![CDATA[')
']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT TRIM('><=' FROM c.query('data(/root/r[contains(text()[1],"@")])')
.value('text()[1]','VARCHAR(1024)'))) AS t2(Par);
Output
ID | Parameters |
---|---|
1 | @organisationId |
2 | @startDate @endDate @enterOrgHere |
3 | NULL |
CodePudding user response:
Using SQL Server for parsing is a very bad idea because of low performance and lack of tools. I highly recommend using .net assembly or external language (since your project is in python anyway) with regexp or any other conversion method. However, as a last resort, you can use something like this extremely slow and generally horrible code (this code working just on sql server 2017 , btw. On earlier versions code will be much more terrible):
DECLARE @sql TABLE
(
id INT PRIMARY KEY IDENTITY
, sql_query NVARCHAR(MAX)
);
INSERT INTO @sql (sql_query)
VALUES (N'SELECT * FROM table WHERE id = @organisationId')
, (N'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere')
, (N' SELECT name ''@'' domain FROM user')
;
WITH prepared AS
(
SELECT id
, SUBSTRING(sql_query, CHARINDEX('@', sql_query) 1, LEN(sql_query)) prep_string
FROM @sql
),
parsed AS
(
SELECT id
, IIF(CHARINDEX(CHAR(32), value) = 0
, SUBSTRING(value, 1, LEN(VALUE))
, SUBSTRING(value, 1, CHARINDEX(CHAR(32), value) -1)
) parsed_value
FROM prepared p
CROSS APPLY STRING_SPLIT(p.prep_string, '@')
)
SELECT id, '@' STRING_AGG(IIF(parsed_value LIKE '[a-zA-Z]%', parsed_value, NULL) , ', @')
FROM parsed
GROUP BY id
CodePudding user response:
You can use string split, and then remove the undesired caracters, here's a query :
DROP TABLE IF EXISTS #TEMP
SELECT 1 AS ID ,'SELECT * FROM table WHERE id = @organisationId' AS Query
INTO #TEMP
UNION ALL SELECT 2, 'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'
UNION ALL SELECT 3, 'SELECT name ''@'' domain FROM user'
;WITH cte as
(
SELECT ID,
Query,
STRING_AGG(REPLACE(REPLACE(REPLACE(value,'<',''),'>',''),'=',''),', ') AS Parameters
FROM #TEMP
CROSS APPLY string_split(Query,' ')
WHERE value LIKE '%@[a-z]%'
GROUP BY ID,
Query
)
SELECT #TEMP.*,cte.Parameters
FROM #TEMP
LEFT JOIN cte on #TEMP.ID = cte.ID