Home > Software design >  Division query (WHERE NOT EXISTS) on 4 tables to return all customers purchased all the products
Division query (WHERE NOT EXISTS) on 4 tables to return all customers purchased all the products

Time:12-28

I would like to use a division query to have all the customers who bought all products.

The template is the following :

SELECT *
FROM Customers AS A
WHERE NOT EXISTS
(
    SELECT *
    FROM Products AS B
    WHERE NOT EXISTS
    (
        SELECT *
        FROM Purchases AS C
        WHERE C.CustomerID= A.CustomerID
        AND C.ProductID= B.ProductID
    ));

It's well explained in this topic: https://stackoverflow.com/a/71877187/17783040

My issue is I don't have 3 tables to join but 4. The division query is supposed to work the same but I can't succeed it.

So my 4 tables:

  • Customers: PK is CustomerID
  • Products: PK is ProductID
  • Invoices: PK is InvoiceID and FK is CustomerID
  • InvoicesLines: FK are InvoiceID and ProductID

I would like to practice this kind of query even if it's an hard way. I really understood the division query working for 3 tables in a many to many relationship.

But in my case, I need a 4th table to join CustomerID to ProductID because the intermediate table as InvoiceLine collects each line of the invoice for a particular ProductID and quatity purchased ( InvoiceID). Then the Invoice table contains only the InvoiceID and CustomerID.

For example with data:

Customers Table:

CustomerID  name        surname address
1           Charles     Smith   123 main street
2           Henry       Johnson 546 Drive
3           Jennifer    Davis   65 Avenue

Products Table:

ProductID   name    quantity
1           pc      12
2           usb     56
3           tv      67

Invoices Table:

InvoiceID   CustomerID
1           1
2           3
3           2
4           3
5           1

InvoiceLines Table:

InvoicelineID   ProductID   description InvoiceID
1               2           blablabla   1
2               1           blablabla   2
3               2           blablabla   2
4               1           blablabla   2
5               2           blablabla   3
6               1           blablabla   3
7               3           blablabla   4
8               3           blablabla   5

So it's the CustomerID #3 who have bought all the products (1,2 and 3) that we can see in the InvoiceLines table for the IncoiceID #2 and #4

This following query returns all the customers in the database without any restriction and not only the one who purchased all the products in catalogue:

SELECT *
FROM Customers AS C
WHERE NOT EXISTS
(
    SELECT *
    FROM Products AS P
    WHERE NOT EXISTS
    (
        SELECT *
        FROM Invoices AS I
        WHERE NOT EXISTS
            (
                SELECT *
                FROM InvoiceLines AS L
                WHERE I.CustomerID= C.CustomerID
                AND L.InvoiceID= I.InvoiceID
                AND L.ProductID= P.ProductID
            )));

It's other try is not working either:

SELECT *
FROM Customers AS C
WHERE NOT EXISTS
(
    SELECT *
    FROM Products AS P
    WHERE NOT EXISTS
    (
        SELECT *
        FROM 
            (
            SELECT * FROM InvoiceLines AS L, Invoices AS I
            WHERE  L.InvoiceID= I.InvoiceID
            ) AS Pu
        WHERE Pu.CustomerID= C.CustomerID
        AND Pu.ProductID= P.ProductID
        )
    );

Do you have any hint or advice to make it work using the division query template?

CodePudding user response:

The logic in your last query is correct, only that you need to replace select * with select l.ProductID, i.CustomerID (otherwise you will get "duplicate column name" errors):

SELECT *
FROM Customers AS C
WHERE NOT EXISTS
(
    SELECT *
    FROM Products AS P
    WHERE NOT EXISTS
    (
        SELECT *
        FROM 
            (
            SELECT L.ProductID, I.CustomerID
            FROM InvoiceLines AS L
            JOIN Invoices AS I ON L.InvoiceID = I.InvoiceID
            ) AS Pu
        WHERE Pu.CustomerID = C.CustomerID
        AND Pu.ProductID = P.ProductID
    )
)

CodePudding user response:

The 2 main columns of interest for solving the question are CustomerID and ProductID which are in Invoices and InvoiceLines, the other tables are of interest only to return additional columns once you get the result, which is the set of I.CustomerId for which the count(distinct L.ProductID) is equal to the count(P.ProductID).

  • Related