I have two tables (Mail and MailTag), each containing over 500,000 records so query efficiency is very important.
For each email record (Mail) there is a minimum of one tag recorded in the MailTag child table. Every email can have multiple tag (so 1 or many).
Parent table is:
Mail:
IdEmail | Subject | Other... |
---|---|---|
1 | ... | ... |
2 | ... | ... |
3 | ... | ... |
4 | ... | ... |
5 | ... | ... |
etc. | ... | ... |
and child table
MailTag:
Id | IdTag | IdMail |
---|---|---|
1 | 9 | 1 |
2 | 9 | 2 |
3 | 9 | 3 |
4 | 10 | 2 |
5 | 10 | 3 |
6 | 11 | 3 |
7 | 12 | 3 |
8 | 9 | 5 |
9 | 10 | 5 |
10 | 11 | 5 |
11 | 12 | 5 |
12 | 9 | 6 |
13 | 11 | 6 |
14 | 13 | 6 |
etc. |
I would like to query emails by tags in two general scenarios:
Scenario 1: (Tags whitelist 9, 11)
Select all emails that have both IdTag=9 and IdTag=11. Returns emails 3, 5 and 6.
Scenario 2: (Tags whitelist 9, 11 and blacklist 10, 12)
Select all emails that have both IdTag=9 and IdTag=11 but not any of the IdTag=10 or IdTag=12. Returns only email 6.
White/blacklist can contain 0 or more IdTags. EDIT: this part has to be parameterized.
How can I write queries for both scenarios? Preferably avoiding any kind of loops ("while loop"), if possible.
CodePudding user response:
As I mentioned, use a conditional aggregate in the HAVING
. For just needing those tags, then use the following:
SELECT M.IdEmail,
M.Subject,
M.[Other...]
FROM dbo.Mail
JOIN dbo.MailTag MT ON M.IdMail = MT.IdMail
GROUP BY M.IdEmail,
M.Subject,
M.[Other...]
HAVING COUNT(CASE WHEN MT.idTag = 9 THEN 1 END) > 0
AND COUNT(CASE WHEN MT.idTag = 11 THEN 1 END) > 0;
For the black list, then then same applies, but you want 0
in this case:
SELECT M.IdEmail,
M.Subject,
M.[Other...]
FROM dbo.Mail
JOIN dbo.MailTag MT ON M.IdMail = MT.IdMail
GROUP BY M.IdEmail,
M.Subject,
M.[Other...]
HAVING COUNT(CASE WHEN MT.idTag = 9 THEN 1 END) > 0
AND COUNT(CASE WHEN MT.idTag = 11 THEN 1 END) > 0
AND COUNT(CASE WHEN MT.idTag IN (10,12) THEN 1 END) = 0;
Based on the new Goal Posts this might work, however, it doesn't return Email 6, because no such ID exists in your sample data:
USE Sandbox;
GO
DECLARE @Whitelist AS table (ID int UNIQUE);
INSERT INTO @Whitelist (ID)
VALUES(9),(11);
DECLARE @Blacklist AS table (ID int UNIQUE);
INSERT INTO @Blacklist (ID)
VALUES(10),(12);
WITH Mail AS(
SELECT *
FROM (VALUES(1,'...','...'),
(2,'...','...'),
(3,'...','...'),
(4,'...','...'),
(5,'...','...'))V(IdEmail,Subject,Other)),
MailTag AS (
SELECT *
FROM (VALUES(1,9,1),
(2,9,2),
(3,9,3),
(4,10,2),
(5,10,3),
(6,11,3),
(7,12,3),
(8,9,5),
(9,10,5),
(10,11,5),
(11,12,5),
(12,9,6),
(13,11,6),
(14,13,6))V(Id,IdTag,IdMail))
SELECT M.IdEmail,
M.Subject,
M.Other
FROM Mail M
JOIN MailTag MT ON M.IdEMail = MT.IdMail --Why is thuis called IdEmail in one table, and IdMail in the other table?
LEFT JOIN @Whitelist WL ON MT.IdTag = WL.ID
LEFT JOIN @Blacklist BL ON MT.IdTag = BL.ID
GROUP BY M.IdEmail,
M.Subject,
M.Other
HAVING COUNT(DISTINCT WL.ID) = (SELECT COUNT(ID) FROM @Whitelist)
AND COUNT(BL.ID) = 0;