Home > database >  Can't query due to aggregated data, why?
Can't query due to aggregated data, why?

Time:12-03

We have a database for 3 book shops, all with an attached inventory and books in random units in stock. The query should display each bookstore, so 3 rows, followed by the quantity (which book in X book store has the highest value calculated with MAX(INV.UnitsInStock), and finally a third column that displays the title of the corresponding book.

SELECT BS.Name, B.Title, MAX(UnitsInStock) AS 'Quantity'
FROM Inventories AS INV
JOIN BookShops AS BS ON BS.Id = INV.ShopId
JOIN Books AS B ON B.Id = INV.BookId
GROUP BY BS.Name

This gives me the following error:

Column 'Books.Title' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

I also tried this:

SELECT BS.Name, MAX(UnitsInStock) AS 'Quantity'
FROM Inventories AS INV
JOIN BookShops AS BS ON BS.Id = INV.ShopId
JOIN Books AS B ON B.Id = INV.BookId
GROUP BY BS.Name

This shows the correct data so far but without the title of the book.

I've tried temp tables, string_agg() (which correctly displays every single book), tried hardcoding each book after finding out exactly which one etc.

How can I fix this?

CodePudding user response:

The error message is right. You can't do it in that way.

Imagine we still group by BS.Name, but do not include MAX(UnitsInStock) in the SELECT list, and instead only included B.Title. Assuming every shop has many books, which one should be shown on each row?

Now remember that each item in the SELECT is list is independent of the others. There is nothing to correlate the MAX(UnitsInStock) entry with the book title. Even more so, you could have both MAX(UnitsInStock) and MIN(UnitsInStock) in the select list. Which book title should be shown then?

In short, you can't show a field in the SELECT list unless you either group by the field or use it as part of a function like MAX(), AVG(), etc.

Instead, to solve this, you have three options. I'll list them in order from worst to best.

Option 1 is JOINing to two extra subqueries. The first subquery looks much like the original query in the question. It returns the shop ID and the MAX(Units) for every book. The second JOIN is to another subquery that looks at the quantity for ALL books, and includes a condition in the ON clause so only the row with the same value as the previous JOIN will match.

This is bad enough I'm not going to even show the code. It's a lot more code, and more joins/read IOPs, and it can create extra duplicate rows if you're not careful to avoid a tie for the highest inventory item at a shop. But it's how we had to do things before 2012 (or late 2018 if you were on MySQL), and it's how you'd have to do it on a database that doesn't support the next two options.

Option 2 uses an APPLY operation (also called a LATERAL JOIN). It looks like this:

SELECT BS.Name, B.Title, INV.UnitsInStock As Quantity
FROM BookShops BS
OUTER APPLY (
    SELECT TOP 1 BookId, UnitsInStock
    FROM Inventories i
    WHERE i.ShopId = BS.Id
    ORDER BY UnitsInStock DESC
) INV
INNER JOIN Books b ON b.Id = INV.BookId

This isn't a bad solution, but it's not as fast as the next option I will show. Still, it can be useful if you have a conflict using the next option.

The final (best!) option uses the row_number() windowing function:

SELECT Name, Title, UnitsInStock As Quantity 
FROM (
    SELECT BS.Name, B.Title inv.UnitsInStock,
       row_number() over (PARTITION BY BS.Id ORDER BY inv.UnitsInStock DESC) rn
    FROM BookShops bs
    INNER JOIN Inventories inv ON inv.ShopId = bs.Id
    INNER JOIN Books b on b.Id = inv.BookId
) t
WHERE rn = 1

Windowing functions are an important new(-ish) feature to add your query toolbelt.

Note in each of my examples I also "grouped" by the shop Id, rather than Name. The original queries in the question should group by both of those fields. Grouping a Name field alone is seldom wise.

CodePudding user response:

I think you need to add B.Title to the GROUP BY:

SELECT BS.Name,B.Title, MAX(UnitsInStock) AS 'Quantity'
FROM Inventories AS INV
JOIN BookShops AS BS ON BS.Id = INV.ShopId
JOIN Books AS B ON B.Id = INV.BookId
GROUP BY BS.Name, B.Title
  • Related