Home > database >  Query parent table on multiple child conditions
Query parent table on multiple child conditions

Time:07-08

I have two tables (Mail and MailTag), each containing over 500,000 records so query efficiency is very important.

For each email record (Mail) there is a minimum of one tag recorded in the MailTag child table. Every email can have multiple tag (so 1 or many).

Parent table is:

Mail:

IdEmail Subject Other...
1 ... ...
2 ... ...
3 ... ...
4 ... ...
5 ... ...
etc. ... ...

and child table

MailTag:

Id IdTag IdMail
1 9 1
2 9 2
3 9 3
4 10 2
5 10 3
6 11 3
7 12 3
8 9 5
9 10 5
10 11 5
11 12 5
12 9 6
13 11 6
14 13 6
etc.

I would like to query emails by tags in two general scenarios:

Scenario 1: (Tags whitelist 9, 11)
Select all emails that have both IdTag=9 and IdTag=11. Returns emails 3, 5 and 6.

Scenario 2: (Tags whitelist 9, 11 and blacklist 10, 12)
Select all emails that have both IdTag=9 and IdTag=11 but not any of the IdTag=10 or IdTag=12. Returns only email 6.

White/blacklist can contain 0 or more IdTags. EDIT: this part has to be parameterized.

How can I write queries for both scenarios? Preferably avoiding any kind of loops ("while loop"), if possible.

CodePudding user response:

As I mentioned, use a conditional aggregate in the HAVING. For just needing those tags, then use the following:

SELECT M.IdEmail,
       M.Subject,
       M.[Other...]
FROM dbo.Mail
     JOIN dbo.MailTag MT ON M.IdMail = MT.IdMail
GROUP BY M.IdEmail,
         M.Subject,
         M.[Other...]
HAVING COUNT(CASE WHEN MT.idTag = 9 THEN 1 END) > 0
   AND COUNT(CASE WHEN MT.idTag = 11 THEN 1 END) > 0;

For the black list, then then same applies, but you want 0 in this case:

SELECT M.IdEmail,
       M.Subject,
       M.[Other...]
FROM dbo.Mail
     JOIN dbo.MailTag MT ON M.IdMail = MT.IdMail
GROUP BY M.IdEmail,
         M.Subject,
         M.[Other...]
HAVING COUNT(CASE WHEN MT.idTag = 9 THEN 1 END) > 0
   AND COUNT(CASE WHEN MT.idTag = 11 THEN 1 END) > 0
   AND COUNT(CASE WHEN MT.idTag IN (10,12) THEN 1 END) = 0;

Based on the new Goal Posts this might work, however, it doesn't return Email 6, because no such ID exists in your sample data:

USE Sandbox;
GO

DECLARE @Whitelist AS table (ID int UNIQUE);
INSERT INTO @Whitelist (ID)
VALUES(9),(11);

DECLARE @Blacklist AS table (ID int UNIQUE);
INSERT INTO @Blacklist (ID)
VALUES(10),(12);

WITH Mail AS(
    SELECT *
    FROM (VALUES(1,'...','...'),
                (2,'...','...'),
                (3,'...','...'),
                (4,'...','...'),
                (5,'...','...'))V(IdEmail,Subject,Other)),
MailTag AS (
    SELECT *
    FROM (VALUES(1,9,1),
                (2,9,2),
                (3,9,3),
                (4,10,2),
                (5,10,3),
                (6,11,3),
                (7,12,3),
                (8,9,5),
                (9,10,5),
                (10,11,5),
                (11,12,5),
                (12,9,6),
                (13,11,6),
                (14,13,6))V(Id,IdTag,IdMail))
SELECT M.IdEmail,
       M.Subject,
       M.Other
FROM Mail M
     JOIN MailTag MT ON M.IdEMail = MT.IdMail --Why is thuis called IdEmail in one table, and IdMail in the other table?
     LEFT JOIN @Whitelist WL ON MT.IdTag = WL.ID
     LEFT JOIN @Blacklist BL ON MT.IdTag = BL.ID
GROUP BY M.IdEmail,
         M.Subject,
         M.Other
HAVING COUNT(DISTINCT WL.ID) = (SELECT COUNT(ID) FROM @Whitelist)
   AND COUNT(BL.ID) = 0;
  • Related