Query parent table on multiple child conditions-CodePudding

I have two tables (Mail and MailTag), each containing over 500,000 records so query efficiency is very important.

For each email record (Mail) there is a minimum of one tag recorded in the MailTag child table. Every email can have multiple tag (so 1 or many).

Parent table is:

Mail:

IdEmail	Subject	Other...
1	...	...
2	...	...
3	...	...
4	...	...
5	...	...
etc.	...	...

and child table

MailTag:

Id	IdTag	IdMail
1	9	1
2	9	2
3	9	3
4	10	2
5	10	3
6	11	3
7	12	3
8	9	5
9	10	5
10	11	5
11	12	5
12	9	6
13	11	6
14	13	6
etc.

I would like to query emails by tags in two general scenarios:

Scenario 1: (Tags whitelist 9, 11)
Select all emails that have both IdTag=9 and IdTag=11. Returns emails 3, 5 and 6.

Scenario 2: (Tags whitelist 9, 11 and blacklist 10, 12)
Select all emails that have both IdTag=9 and IdTag=11 but not any of the IdTag=10 or IdTag=12. Returns only email 6.

White/blacklist can contain 0 or more IdTags. EDIT: this part has to be parameterized.

How can I write queries for both scenarios? Preferably avoiding any kind of loops ("while loop"), if possible.

CodePudding user response：

As I mentioned, use a conditional aggregate in the HAVING. For just needing those tags, then use the following:

SELECT M.IdEmail,
       M.Subject,
       M.[Other...]
FROM dbo.Mail
     JOIN dbo.MailTag MT ON M.IdMail = MT.IdMail
GROUP BY M.IdEmail,
         M.Subject,
         M.[Other...]
HAVING COUNT(CASE WHEN MT.idTag = 9 THEN 1 END) > 0
   AND COUNT(CASE WHEN MT.idTag = 11 THEN 1 END) > 0;

For the black list, then then same applies, but you want 0 in this case:

SELECT M.IdEmail,
       M.Subject,
       M.[Other...]
FROM dbo.Mail
     JOIN dbo.MailTag MT ON M.IdMail = MT.IdMail
GROUP BY M.IdEmail,
         M.Subject,
         M.[Other...]
HAVING COUNT(CASE WHEN MT.idTag = 9 THEN 1 END) > 0
   AND COUNT(CASE WHEN MT.idTag = 11 THEN 1 END) > 0
   AND COUNT(CASE WHEN MT.idTag IN (10,12) THEN 1 END) = 0;

Based on the new Goal Posts this might work, however, it doesn't return Email 6, because no such ID exists in your sample data:

USE Sandbox;
GO

DECLARE @Whitelist AS table (ID int UNIQUE);
INSERT INTO @Whitelist (ID)
VALUES(9),(11);

DECLARE @Blacklist AS table (ID int UNIQUE);
INSERT INTO @Blacklist (ID)
VALUES(10),(12);

WITH Mail AS(
    SELECT *
    FROM (VALUES(1,'...','...'),
                (2,'...','...'),
                (3,'...','...'),
                (4,'...','...'),
                (5,'...','...'))V(IdEmail,Subject,Other)),
MailTag AS (
    SELECT *
    FROM (VALUES(1,9,1),
                (2,9,2),
                (3,9,3),
                (4,10,2),
                (5,10,3),
                (6,11,3),
                (7,12,3),
                (8,9,5),
                (9,10,5),
                (10,11,5),
                (11,12,5),
                (12,9,6),
                (13,11,6),
                (14,13,6))V(Id,IdTag,IdMail))
SELECT M.IdEmail,
       M.Subject,
       M.Other
FROM Mail M
     JOIN MailTag MT ON M.IdEMail = MT.IdMail --Why is thuis called IdEmail in one table, and IdMail in the other table?
     LEFT JOIN @Whitelist WL ON MT.IdTag = WL.ID
     LEFT JOIN @Blacklist BL ON MT.IdTag = BL.ID
GROUP BY M.IdEmail,
         M.Subject,
         M.Other
HAVING COUNT(DISTINCT WL.ID) = (SELECT COUNT(ID) FROM @Whitelist)
   AND COUNT(BL.ID) = 0;