I need help counting how many times a 'pattern' is found within a string. I'm using a tool which extracts inputted patters into a column. My dateset looks like the following:
Id | MatchedPatterns |
---|---|
1 | "[Pattern One: SE09B00218], [Pattern Two: 0000000-09000,0974837-01]" |
2 | "[Pattern One: SE09B00218,DA02F11418], [Pattern Two: 0000000-09,0974837-01]" |
3 | "[Pattern One: TR38G01678,DA02F11418,KJ73F4567]" |
4 | "[Pattern Two: 0000009-00000007,1234567-02,0939485-0004,09836473-1]" |
You can see that each pattern can be found multiple times (up to 200), separated by a comma or it can be null. Each pattern identified can also be different lengths.
Here is what I want the output to look like:
Id | PatternOneCount | PatternTwoCount |
---|---|---|
1 | 1 | 2 |
2 | 2 | 2 |
3 | 3 | 0 |
4 | 0 | 4 |
Is this possible?
Many thanks
CodePudding user response:
Assuming you're on a (fully) supported version of SQL Server, you could transform your data into valid JSON, and then consume it.
So first you REPLACE
the brackets ([]
) with double quotes ("
), and the wrap the colons (:
) with double quotes as well. Then you wrap the entire string in braces ({}
), to get your JSON string.
Next you consume that JSON with OPENJSON
, use STRING_SPLIT
to split the delimited data, and then finally use conditional aggregation... yuck.
SELECT V.Id,
COUNT(CASE J.[key] WHEN 'Pattern One' THEN SS.[value] END) AS PatternOne,
COUNT(CASE J.[key] WHEN 'Pattern Two' THEN SS.[value] END) AS PatternTwo
FROM (VALUES(1,'[Pattern One: SE09B00218], [Pattern Two: 0000000-09000,0974837-01]'),
(2,'[Pattern One: SE09B00218,DA02F11418], [Pattern Two: 0000000-09,0974837-01]'),
(3,'[Pattern One: TR38G01678,DA02F11418,KJ73F4567]'),
(4,'[Pattern Two: 0000009-00000007,1234567-02,0939485-0004,09836473-1]'))V(Id,MatchedPatterns)
--Make the pattern valid JSON
CROSS APPLY (VALUES(REPLACE(TRANSLATE(V.MatchedPatterns,'[]','""'),': ','":"')))R(JsonPatterns)
CROSS APPLY OPENJSON(CONCAT('{',R.JsonPatterns,'}')) J
OUTER APPLY STRING_SPLIT(J.[value],',') SS
GROUP BY V.Id;