Home > Mobile >  Extracting parameters from strings - SQL Server
Extracting parameters from strings - SQL Server

Time:11-30

I have a table with strings in one column, which are actually storing other SQL Queries written before and stored to be ran at later times. They contain parameters such as '@organisationId' or '@enterDateHere'. I want to be able to extract these.

Example:

ID Query
1 SELECT * FROM table WHERE id = @organisationId
2 SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere
3 SELECT name '@' domain FROM user

I want the following:

ID Parameters
1 @organisationId
2 @startDate, @endDate, @enterOrgHere
3 NULL

No need to worry about how to separate/list them, as long as they are clearly visible and as long as the query lists all of them, which I don't know the number of. Please note that sometimes the queries contain just @ for example when email binding is being done, but it's not a parameter. I want just strings which start with @ and have at least one letter after it, ending with a non-letter character (space, enter, comma, semi-colon). If this causes problems, then return all strings starting with @ and I will simply identify the parameters manually.

It can include usage of Excel/Python/C# if needed, but SQL is preferable.

CodePudding user response:

The official way to interrogate the parameters is with sp_describe_undeclared_parameters, eg

exec sp_describe_undeclared_parameters @tsql =  N'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'  

CodePudding user response:

It is very simple to implement by using tokenization via XML and XQuery.

Notable points:

  1. 1st CROSS APPLY is tokenazing Query column as XML.
  2. 2nd CROSS APPLY is filtering out tokens that don't have "@" symbol.

SQL #1

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, Query VARCHAR(2048));
INSERT INTO @tbl (Query) VALUES
('SELECT * FROM table WHERE id = @organisationId'),
('SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'),
('SELECT name   ''@''   domain FROM user');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1);

SELECT t.ID
    , Parameters = IIF(t2.Par LIKE '@[a-z]%', t2.Par, NULL)
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
      REPLACE(Query, @separator, ']]></r><r><![CDATA[')   
      ']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT TRIM('><=' FROM c.query('data(/root/r[contains(text()[1],"@")])').value('text()[1]','VARCHAR(1024)'))) AS t2(Par)

SQL #2

A cleansing step was added to handle other than a regular space whitespaces first.

SELECT t.*
    , Parameters = IIF(t2.Par LIKE '@[a-z]%', t2.Par, NULL)
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA['   Query   ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t0(pure)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
      REPLACE(Pure, @separator, ']]></r><r><![CDATA[')   
      ']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT TRIM('><=' FROM c.query('data(/root/r[contains(text()[1],"@")])')
                .value('text()[1]','VARCHAR(1024)'))) AS t2(Par);

Output

ID Parameters
1 @organisationId
2 @startDate @endDate @enterOrgHere
3 NULL

CodePudding user response:

Using SQL Server for parsing is a very bad idea because of low performance and lack of tools. I highly recommend using .net assembly or external language (since your project is in python anyway) with regexp or any other conversion method. However, as a last resort, you can use something like this extremely slow and generally horrible code (this code working just on sql server 2017 , btw. On earlier versions code will be much more terrible):

DECLARE @sql TABLE
(
  id INT PRIMARY KEY IDENTITY
, sql_query NVARCHAR(MAX)
);

INSERT INTO @sql (sql_query)
VALUES (N'SELECT * FROM table WHERE id = @organisationId')
     , (N'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere')
     , (N'  SELECT name   ''@''   domain FROM user')
     ;


WITH prepared AS
(
  SELECT id
       , SUBSTRING(sql_query, CHARINDEX('@', sql_query)   1, LEN(sql_query)) prep_string
    FROM @sql
),
parsed AS
(
SELECT id
     , IIF(CHARINDEX(CHAR(32), value) = 0
          , SUBSTRING(value, 1, LEN(VALUE))
          , SUBSTRING(value, 1, CHARINDEX(CHAR(32), value) -1)
          ) parsed_value
  FROM prepared p
  CROSS APPLY STRING_SPLIT(p.prep_string, '@')
)
SELECT id, '@'   STRING_AGG(IIF(parsed_value LIKE '[a-zA-Z]%', parsed_value, NULL) , ', @')
  FROM parsed
GROUP BY id

CodePudding user response:

You can use string split, and then remove the undesired caracters, here's a query :

DROP TABLE IF EXISTS #TEMP

SELECT 1 AS ID ,'SELECT * FROM table WHERE id = @organisationId' AS Query
INTO #TEMP
UNION ALL SELECT 2, 'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'
UNION ALL SELECT 3, 'SELECT name   ''@''   domain FROM user'

;WITH cte as
(
    SELECT ID,
        Query,
        STRING_AGG(REPLACE(REPLACE(REPLACE(value,'<',''),'>',''),'=',''),', ') AS Parameters
    FROM #TEMP
    CROSS APPLY string_split(Query,' ')
    WHERE value LIKE '%@[a-z]%'
    GROUP BY ID,
        Query
)
SELECT #TEMP.*,cte.Parameters
FROM #TEMP
LEFT JOIN cte on #TEMP.ID = cte.ID
  • Related