Home > Enterprise >  Split column value into substrings and search for pattern in SQL
Split column value into substrings and search for pattern in SQL

Time:04-24

I have a table like this:

campaign code
AL2330GH_HDKASL_QCLKP NULL
JPDJK34_QPKSLL_QKPAL NULL
QCK32_SDSKDS_TLLKA NULL

I want to update the above table by populating the column 'code' with a substring in column 'campaign' which starts with 'AL', 'QC', or 'QP'. All the column values have 3 substrings separated by an '_'. If none of the substrings matches with the provided values, then keep the 'code' column value as NULL. And if multiple matches happen, take the first substring.

Desired Output:

campaign code
AL2330GH_HDKASL_QCLKP AL2330GH
JPDJK34_QPKSLL_QKPAL QPKSLL
QCK32_SDSKDS_TLLKA QCK32

Link to try out the problem: https://www.db-fiddle.com/f/8qoFDL1RmjwpwFNP3LP4eK/1

CodePudding user response:

Here's a method using OPENJSON():

;WITH src AS
(
  SELECT campaign, value, code, 
    rn = ROW_NUMBER() OVER (PARTITION BY campaign ORDER BY [key])
  FROM
  (
    SELECT campaign, [key], value, code
    FROM dbo.SomeTable 
    CROSS APPLY OPENJSON(CONCAT('["',
      REPLACE(STRING_ESCAPE(campaign,'JSON'),'_','","'),'"]')) AS j
  ) AS x WHERE LEFT(value,2) IN ('AL','QC','QP')
)
UPDATE src SET code = value WHERE rn = 1;

CodePudding user response:

You can try to use STRING_SPLIT with CROSS APPLY and ROW_NUMBER window function to make it.

CHARINDEX function will find the first match position value then we can put the split value in the first parameter, then we can find which string the first appearance.

SELECT campaign,value
FROM (
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY campaign ORDER BY CHARINDEX(v.value,t1.campaign)) rn 
    FROM  mainTable t1
    CROSS APPLY STRING_SPLIT(t1.campaign,'_') v
    WHERE (value LIKE 'AL%'
    OR value LIKE 'QC%'
    OR value LIKE 'QP%')
) t1
WHERE rn = 1

If you want to UPDATE values you can try UPDATE like this.

UPDATE t1
SET
    code = value
FROM (
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY campaign ORDER BY CHARINDEX(v.value,t1.campaign)) rn 
    FROM  mainTable t1
    CROSS APPLY STRING_SPLIT(t1.campaign,'_') v
    WHERE (value LIKE 'AL%'
    OR value LIKE 'QC%'
    OR value LIKE 'QP%')
) t1
WHERE rn = 1

sqlfiddle

CodePudding user response:

Please try the following solution.

It is using XML and XQuery for tokenization. XML/XQuery data model is based on ordered sequences. Exactly what we need for the scenario.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, campaign varchar(50), code varchar(20));
INSERT INTO @tbl (campaign, code) VALUES
('AL2330GH_HDKASL_QCLKP', NULL),
('JPDJK34_QPKSLL_QKPAL', NULL),
('QCK32_SDSKDS_TLLKA', NULL);
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = '_';

UPDATE t
SET code = c.query('
    for $x in /root/r[substring(text()[1],1,2)=("AL","QC","QP")]
    return $x').value('(/r/text())[1]', 'VARCHAR(20)')
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
      REPLACE(campaign, @separator, ']]></r><r><![CDATA[')   
      ']]></r></root>' AS XML)) AS t1(c);

Output

 ---- ----------------------- ---------- 
| id |       campaign        |   Code   |
 ---- ----------------------- ---------- 
|  1 | AL2330GH_HDKASL_QCLKP | AL2330GH |
|  2 | JPDJK34_QPKSLL_QKPAL  | QPKSLL   |
|  3 | QCK32_SDSKDS_TLLKA    | QCK32    |
 ---- ----------------------- ---------- 
  • Related