I got an LEFT JOIN exercise at school:
"List all category names with the number of their products."
Used were two tables from the northwind DB: products (77 rows) and categories (8 rows)
I thought the product table should come first, since the main-data (number of products) will be found there and only the 8 category names will be needed from the joined table. Our teacher argued, that the categories table needs to be the main table, but i still can't understand why.
The two queries are:
SELECT C.CategoryID, CategoryName, COUNT(ProductID) [Count]
FROM Categories C LEFT JOIN Products P
ON C.CategoryID = P.CategoryID
GROUP BY C.CategoryID, CategoryName
and
SELECT P.CategoryID, CategoryName, COUNT(ProductID) [Count]
FROM Products P LEFT JOIN Categories C
ON P.CategoryID = C.CategoryID
GROUP BY CategoryName, P.CategoryID
Can anybody explain to me why, in this case, a certain order of used tables matters in terms of theoretical performance?
And if: how so? (does size matter?;))
CodePudding user response:
The name of the exercise tells yo what is the first table in your case.
"List all category names with the number of their products."
So get the all category names. Category names is what you HAVE TO SHOW - ALL OF THEM. You want to show all of them regardless of the fact is there a matching CategoryID
in the Products
table.
For example, if you want to show all product names with number of their categories then you want to show all product names regardless if there exists matching ProductID
in Categories
table.
This demo shows you what the two queries will return if we have 3 categories and one product. It is not the best demo in the world but it does the trick I believe.
The tables:
create table Categories (CategoryID int, CategoryName varchar(20))
create table Products (ProductID int, CategoryID int)
The data:
insert into Categories values(1, 'Cat1');
insert into Categories values(2, 'Cat2');
insert into Categories values(3, 'Cat3');
insert into Products values(1, 1);
Query1:
SELECT C.CategoryID, CategoryName, COUNT(ProductID) as Cnt
FROM Categories C
LEFT JOIN Products P ON C.CategoryID = P.CategoryID
GROUP BY C.CategoryID, CategoryName
Result1:
CategoryID CategoryName Cnt
1 Cat1 1
2 Cat2 0
3 Cat3 0
Query2:
SELECT P.CategoryID, CategoryName, COUNT(ProductID) as Cnt
FROM Products P
LEFT JOIN Categories C ON P.CategoryID = C.CategoryID
GROUP BY CategoryName, P.CategoryID
Result2:
CategoryID CategoryName Cnt
1 Cat1 1
I see in your question that you say: "Used were two tables from the northwind DB: products (77 rows) and categories (8 rows)" So maybe it is strange now for you how can my example be like this and yours "since the results of both queries are obviousely the same" ?
Here is the demo that will show you how it can be the same with different set of data.