Been struggling with this problem, can't figure it out. My simplified table schemas are:
- SalesOrderHeader(SalesOrderID int, ShipToAddressID int),
- SalesOrderDetails(SalesOrderID int, ProductID int),
- Address(ShipToAddressID int, City nvarchar),
- Product(ProductID int, ProductCategory int)
- ProductCategory(ProductCategoryID int, Name nvarchar).
I tried to find the 3 most popular cities(the ones which have the most orders delivered to) and the most popular product categories in these cities, but unfortunately, can't make it work properly.
select count(*) as OrderNum, ProductCategory.Name, City from SalesLT.SalesOrderDetail
left join SalesLT.SalesOrderHeader on SalesLT.SalesOrderDetail.SalesOrderID = SalesLT.SalesOrderHeader.SalesOrderID
left join SalesLT.Address on SalesLT.Address.AddressID = SalesOrderHeader.ShipToAddressID
left join SalesLT.Product on SalesOrderDetail.ProductID = Product.ProductID
left join SalesLT.ProductCategory on ProductCategory.ProductCategoryID = Product.ProductCategoryID
where City in
(select top(3) City from SalesLT.SalesOrderHeader
left join SalesLT.Address on SalesLT.Address.AddressID = SalesOrderHeader.ShipToAddressID
group by City
order by Count(*) desc)
group by City, ProductCategory.Name
order by count(*) desc
I tried to change the query to make it return only 1 position for each city, but it didn't work. Would be grateful to hear an advice, thank you.
Query returns following result
CodePudding user response:
You can use the ROW_NUMBER()
windowing function to assign rankings to each result partitioned by city and ordered by descending count. If you wrap that in a Common Table Expression (CTE), you can then add a final select which just selects the top (ranking = 1) row for each city.
Something like:
with CTE as (
select
count(*) as OrderNum,
ProductCategory.Name,
City,
row_number() over(partition by City order by count(*) desc) as Ranking
from SalesLT.SalesOrderDetail
left join SalesLT.SalesOrderHeader on SalesLT.SalesOrderDetail.SalesOrderID = SalesLT.SalesOrderHeader.SalesOrderID
left join SalesLT.Address on SalesLT.Address.AddressID = SalesOrderHeader.ShipToAddressID
left join SalesLT.Product on SalesOrderDetail.ProductID = Product.ProductID
left join SalesLT.ProductCategory on ProductCategory.ProductCategoryID = Product.ProductCategoryID
where City in (
select top(3) City
from SalesLT.SalesOrderHeader
left join SalesLT.Address on SalesLT.Address.AddressID = SalesOrderHeader.ShipToAddressID
group by City
order by Count(*) desc
)
group by City, ProductCategory.Name
)
select OrderNum, Name, City
from CTE
where Ranking = 1
order by City
If you wanted to display more than one category per city (say the top 3), you can adjust the final condition to something like where Ranking <= 3
.
If you want to better handle ties (showing all categories tied with the same high count), you can use RANK()
instead of ROW_NUMBER()
.
SIDE NOTE: City names may not be unique across states, provinces, countries, etc. You might want to look into expanding on the group by City logic.
CodePudding user response:
I have based SQL on your table descriptions. This should work (in SQL SERVER)
SELECT bestCity.City,
(SELECT TOP 1 ProductCategory.ProductCategoryID
FROM SalesOrderDetail
INNER JOIN SalesOrderHeader ON SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID
INNER JOIN Address ON Address.ShipToAddressID = SalesOrderHeader.ShipToAddressID
INNER JOIN Product ON SalesOrderDetail.ProductID = Product.ProductID
INNER JOIN ProductCategory ON ProductCategory.ProductCategoryID = Product.ProductCategoryID
WHERE Address.City= bestCity.City
GROUP BY ProductCategory.ProductCategoryID
ORDER BY count(*) DESC) TopCategory
FROM
(SELECT top(3) City
FROM SalesOrderHeader
LEFT JOIN Address ON Address.ShipToAddressID = SalesOrderHeader.ShipToAddressID
GROUP BY City
ORDER BY Count(*) DESC) bestCity