I have a T-SQL query to do in the most efficient way as possible.
Here is an example of my table:
A | B | C | D | E |
---|---|---|---|---|
1 | x, y | z | NULL | NULL |
2 | x | NULL | NULL | y |
3 | y | z | NULL | NULL |
4 | a | NULL | b | x |
Now, I need to do a query to classify my best matching records. Let's say that I need to take the top 3 records that match the more of the values 'x' & 'y' (it could be more than 2 values) into the columns B, C, D, E
A | NumberOfMatches | Comment |
---|---|---|
1 | 2 | Because Column B contains x, y |
2 | 2 | Because Column B contains x & Column E contains y |
3 | 1 | Because Column B contains y |
4 | 1 | Because Column E contains x |
Could you help me to find a good way to do this query?
CodePudding user response:
I would highly recommend that you change how your data is stored, storing delimited strings to record multiple values is a recipe for disaster. If you have a one to many relationship use a child table with a foreign key to the main table. There is very rarely a reason to store delimited data like this, and when you have to query it and manipulate it you realise why it is not advised.
Assuming that each of your columns are the same and can hold many values, you'll need to split all of them, which you can do using this:
SELECT t.A, upvt.Col, Value = TRIM(ss.value)
FROM #T AS t
CROSS APPLY (VALUES ('B', t.B), ('C', t.C), ('D', t.D), ('E', t.E)) upvt (Col, Value)
CROSS APPLY STRING_SPLIT(upvt.Value, ',') AS ss;
Which gives:
A | Col | Value |
---|---|---|
1 | B | x |
1 | B | y |
1 | C | z |
2 | B | x |
2 | E | y |
3 | B | y |
3 | C | z |
4 | B | a |
4 | D | b |
4 | E | x |
With this normalised data, you can then just do a simple WHERE Value IN ('x', 'y')
along with GROUP BY
and COUNT(*)
:
IF OBJECT_ID(N'tempdb..#T ', 'U') IS NOT NULL
DROP TABLE #T ;
CREATE TABLE #T (A INT, B VARCHAR(4), C VARCHAR(4), D VARCHAR(4), E VARCHAR(4));
INSERT #T(A, B, C, D, E)
VALUES
(1, 'x, y', 'z', NULL, NULL),
(2, 'x', NULL, NULL, 'y'),
(3, 'y', 'z', NULL, NULL),
(4, 'a', NULL, 'b', 'x');
SELECT t.A, NumberOfMatches = COUNT(*)
FROM #T AS t
CROSS APPLY (VALUES (t.B), (t.C), (t.D), (t.E)) upvt (Value)
CROSS APPLY STRING_SPLIT(upvt.Value, ',') AS ss
WHERE TRIM(ss.value) IN ('x', 'y')
GROUP BY t.A
ORDER BY COUNT(*) DESC, t.A;
CodePudding user response:
If you can't normalize the model you can use this query to get your expected result:
select a, count(*) numberOfMatches,
concat('Because column ', string_agg(columnName, ', ')) AS comment
from (
select a, columnName, trim(value) targetValue from (
select a, columnName, value result
from tbl unpivot (value for columnName in ([B], [C], [D], [E])) up
) t
outer apply string_split(result,',')
) r
where targetValue in ('x','y')
group by a
-- Result
/*
a numberOfMatches comment
1 2 Because column B, B
2 2 Because column B, E
3 1 Because column B
4 1 Because column E
*/