Home > Software engineering >  How to get values from one table that are not being used by top 3 results in another table?
How to get values from one table that are not being used by top 3 results in another table?

Time:12-03

I need to figure out what beers from table 'Beer' were not ordered by top 3 buyers from table 'Buyers'. Beer.BeerId is foreign key in Buyers.BeerId. Other important columns in 'Buyers' are: BuyId, PubId, StoreId and Quantity

dbo.Beer

BeerId
1
2
3
4
5

dbo.Buyers

BuyId PubId StoreId BeerId Quantity
1 1 NULL 1 30
2 NULL 1 2 40
3 2 NULL 3 50
4 NULL 2 4 10

I tried doing this query but it gives me no results.

SELECT Be.BeerId
    FROM Beer be
    left outer join Buyer bu
    ON be.BeerId=bu.BeerId
WHERE not exists (
        SELECT TOP(3) BuyId, bu.BeerId, 
            SUM(Quantity) as TotalOrdered
        FROM Buyer bu
        GROUP BY BuyId, bu.BeerId
        ORDER BY SUM(Quantity) DESC)

What I would expect to see is that from the top 3 results, beers that are not ordered are BeerId=4 and BeerId=5

CodePudding user response:

This seems like a classic window function problem. You can use ROW_NUMBER() to enumerate the rows in order of their Quantity totals descending, and filter out the top 3 like so:

WITH _BuyersTotalsRanked AS
(
    SELECT 
        BeerId, -- Same BeerId might've been ordered by different Buyers
        ROW_NUMBER() OVER (ORDER BY TotalOrderedPerBeer DESC) AS BuyersRanked
    FROM 
    (
        SELECT 
            BeerId, 
            SUM(Quantity) AS TotalOrderedPerBeer
        FROM dbo.Buyer
        GROUP BY BuyId, BeerId
    ) AS BuyersTotals
)

SELECT DISTINCT BeerId -- Same BeerId might've been ordered by different Buyers
FROM _BuyersTotalsRanked
WHERE BuyersRanked > 3 -- Filter out the top 3 Buyers Beers

Note that if there's a tie for the top 3, this will return nondeterminsitic (semi-random) results because of the ORDER BY clause in the ROW_NUMBER() function not being unique (same problem with your ORDER BY clause in your example query).

To correct that, you would need to add a uniquifying field to deterministically decide which row wins in a tie. Usually you would use the key of the table. I'm not sure what the key is for your dbo.Buyer table. But whatever it is, or whichever combination of fields are unique, you can update the ROW_NUMBER() clause like so ROW_NUMBER() OVER (ORDER BY TotalOrderedPerBeer DESC, UnqiueSetOfFields). Then those fields would be used to determine the winning row(s) in a tie.


Alternatively, if when there are ties for top 3, and you want to exclude all of those ties as well, you can use a different window function such as DENSE_RANK() like so:

WITH _BuyersTotalsRanked AS
(
    SELECT 
        BeerId,
        DENSE_RANK() OVER (ORDER BY TotalOrderedPerBeer DESC) AS BuyersRanked
    FROM 
    (
        SELECT 
            BeerId, 
            SUM(Quantity) AS TotalOrderedPerBeer
        FROM dbo.Buyer
        GROUP BY BuyId, BeerId
    ) AS BuyersTotals
)

SELECT DISTINCT BeerId -- Same BeerId might've been ordered by different Buyers
FROM _BuyersTotalsRanked
WHERE BuyersRanked > 3 -- Filter out the top 3 Buyers Beers

DENSE_RANK() will enumerate all tied rows with the same value, instead of randomly deciding which one comes first like ROW_NUMBER() does.


Finally, your question (and data / schema) is a little unclear. Based on your example code, I'm interpreting that what you're asking for is to exclude the top 3 Buyers of any single Beer, not top 3 by their told purchases of all Beers. E.g. Buyer1 may have only purchased one type of beer, BeerA for a quantity of 50, but Buyer2 purchased two types of beer, BeerB for 30, and BeerC for 40. In the above code (and your attempted code) Buyer1 would be ranked #1 even though Buyer2 has a greater total purchase of Beers, totaling 70.

If you wanted to rank the Buyers by their total purchases across all Beers, then you'd want to do something like the following:

WITH _BuyersTotalsRanked AS
(
    SELECT 
        BuyId,
        DENSE_RANK() OVER (ORDER BY TotalOrderedAll DESC) AS BuyersRanked
    FROM 
    (
        SELECT 
            BuyId, 
            SUM(Quantity) AS TotalOrderedAll
        FROM dbo.Buyer
        GROUP BY BuyId
    ) AS BuyersTotals
)

SELECT DISTINCT B.BeerId -- Same BeerId might've been ordered by different Buyers
FROM dbo.Buyer AS B
INNER JOIN _BuyersTotalsRanked AS BTR
    ON B.BuyId = BTR.BuyId
WHERE B.BuyersRanked > 3 -- Filter out the top 3 Buyers Beers

CodePudding user response:

select top (3)BeerId from ( SELECT BeerId, SUM(Quantity) Be.BeerId FROM Beer be left outer join Buyer bu ON be.BeerId=bu.BeerId Group by BeerId) my_derived_table ORDER BY BeerId

  • Related