Home > Software engineering >  Counting separate occurrences in a string T-SQL
Counting separate occurrences in a string T-SQL

Time:03-02

I need help counting how many times a 'pattern' is found within a string. I'm using a tool which extracts inputted patters into a column. My dateset looks like the following:

Id MatchedPatterns
1 "[Pattern One: SE09B00218], [Pattern Two: 0000000-09000,0974837-01]"
2 "[Pattern One: SE09B00218,DA02F11418], [Pattern Two: 0000000-09,0974837-01]"
3 "[Pattern One: TR38G01678,DA02F11418,KJ73F4567]"
4 "[Pattern Two: 0000009-00000007,1234567-02,0939485-0004,09836473-1]"

You can see that each pattern can be found multiple times (up to 200), separated by a comma or it can be null. Each pattern identified can also be different lengths.

Here is what I want the output to look like:

Id PatternOneCount PatternTwoCount
1 1 2
2 2 2
3 3 0
4 0 4

Is this possible?

Many thanks

CodePudding user response:

Assuming you're on a (fully) supported version of SQL Server, you could transform your data into valid JSON, and then consume it.

So first you REPLACE the brackets ([]) with double quotes ("), and the wrap the colons (:) with double quotes as well. Then you wrap the entire string in braces ({}), to get your JSON string.

Next you consume that JSON with OPENJSON, use STRING_SPLIT to split the delimited data, and then finally use conditional aggregation... yuck.

SELECT V.Id,
       COUNT(CASE J.[key] WHEN 'Pattern One' THEN SS.[value] END) AS PatternOne,
       COUNT(CASE J.[key] WHEN 'Pattern Two' THEN SS.[value] END) AS PatternTwo
FROM (VALUES(1,'[Pattern One: SE09B00218], [Pattern Two: 0000000-09000,0974837-01]'),
            (2,'[Pattern One: SE09B00218,DA02F11418], [Pattern Two: 0000000-09,0974837-01]'),
            (3,'[Pattern One: TR38G01678,DA02F11418,KJ73F4567]'),
            (4,'[Pattern Two: 0000009-00000007,1234567-02,0939485-0004,09836473-1]'))V(Id,MatchedPatterns)
     --Make the pattern valid JSON
     CROSS APPLY (VALUES(REPLACE(TRANSLATE(V.MatchedPatterns,'[]','""'),': ','":"')))R(JsonPatterns)
     CROSS APPLY OPENJSON(CONCAT('{',R.JsonPatterns,'}')) J
     OUTER APPLY STRING_SPLIT(J.[value],',') SS
GROUP BY V.Id;
  • Related