Home > database >  SQL query, return the most popular category for each city
SQL query, return the most popular category for each city

Time:11-23

Been struggling with this problem, can't figure it out. My simplified table schemas are:

  1. SalesOrderHeader(SalesOrderID int, ShipToAddressID int),
  2. SalesOrderDetails(SalesOrderID int, ProductID int),
  3. Address(ShipToAddressID int, City nvarchar),
  4. Product(ProductID int, ProductCategory int)
  5. ProductCategory(ProductCategoryID int, Name nvarchar).

I tried to find the 3 most popular cities(the ones which have the most orders delivered to) and the most popular product categories in these cities, but unfortunately, can't make it work properly.

select count(*) as OrderNum, ProductCategory.Name, City from SalesLT.SalesOrderDetail
 left join SalesLT.SalesOrderHeader on SalesLT.SalesOrderDetail.SalesOrderID = SalesLT.SalesOrderHeader.SalesOrderID
 left join SalesLT.Address on SalesLT.Address.AddressID = SalesOrderHeader.ShipToAddressID
 left join SalesLT.Product on SalesOrderDetail.ProductID = Product.ProductID
 left join SalesLT.ProductCategory on ProductCategory.ProductCategoryID = Product.ProductCategoryID
 where City in 
(select top(3) City from SalesLT.SalesOrderHeader
left join SalesLT.Address on SalesLT.Address.AddressID = SalesOrderHeader.ShipToAddressID
group by City
order by Count(*) desc)
group by City, ProductCategory.Name
order by count(*) desc

I tried to change the query to make it return only 1 position for each city, but it didn't work. Would be grateful to hear an advice, thank you.

Query returns following result

CodePudding user response:

You can use the ROW_NUMBER() windowing function to assign rankings to each result partitioned by city and ordered by descending count. If you wrap that in a Common Table Expression (CTE), you can then add a final select which just selects the top (ranking = 1) row for each city.

Something like:

with CTE as (
    select
        count(*) as OrderNum,
        ProductCategory.Name,
        City,
        row_number() over(partition by City order by count(*) desc) as Ranking
    from SalesLT.SalesOrderDetail
     left join SalesLT.SalesOrderHeader on SalesLT.SalesOrderDetail.SalesOrderID = SalesLT.SalesOrderHeader.SalesOrderID
     left join SalesLT.Address on SalesLT.Address.AddressID = SalesOrderHeader.ShipToAddressID
     left join SalesLT.Product on SalesOrderDetail.ProductID = Product.ProductID
     left join SalesLT.ProductCategory on ProductCategory.ProductCategoryID = Product.ProductCategoryID
    where City in (
        select top(3) City
        from SalesLT.SalesOrderHeader
        left join SalesLT.Address on SalesLT.Address.AddressID = SalesOrderHeader.ShipToAddressID
        group by City
        order by Count(*) desc
    )
    group by City, ProductCategory.Name
)
select OrderNum, Name, City
from CTE
where Ranking = 1
order by City

If you wanted to display more than one category per city (say the top 3), you can adjust the final condition to something like where Ranking <= 3.

If you want to better handle ties (showing all categories tied with the same high count), you can use RANK() instead of ROW_NUMBER().

SIDE NOTE: City names may not be unique across states, provinces, countries, etc. You might want to look into expanding on the group by City logic.

CodePudding user response:

I have based SQL on your table descriptions. This should work (in SQL SERVER)

SELECT bestCity.City,
  (SELECT TOP 1 ProductCategory.ProductCategoryID
   FROM SalesOrderDetail
   INNER JOIN SalesOrderHeader ON SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID
   INNER JOIN Address ON Address.ShipToAddressID = SalesOrderHeader.ShipToAddressID
   INNER JOIN Product ON SalesOrderDetail.ProductID = Product.ProductID
   INNER JOIN ProductCategory ON ProductCategory.ProductCategoryID = Product.ProductCategoryID
   WHERE Address.City= bestCity.City
   GROUP BY ProductCategory.ProductCategoryID
   ORDER BY count(*) DESC) TopCategory
FROM
  (SELECT top(3) City
   FROM SalesOrderHeader
   LEFT JOIN Address ON Address.ShipToAddressID = SalesOrderHeader.ShipToAddressID
   GROUP BY City
   ORDER BY Count(*) DESC) bestCity

  • Related