I have a product system that each category can have multiple subcategory layers if there aren't any products attached to them, and we can attach as many products as we want to a category if there aren't any sub-categories attached to it like the example below:
PS: I deleted other columns/constraints from the two tables for simplicity
CREATE TABLE categories (
id serial PRIMARY KEY,
name VARCHAR NOT NULL,
parent_id INT
);
CREATE TABLE products (
id serial PRIMARY KEY,
name VARCHAR NOT NULL,
category_id serial,
CONSTRAINT fk_category FOREIGN KEY(category_id) REFERENCES categories(id)
);
INSERT INTO categories (
id,
name,
parent_id
)
VALUES
(1, 'Electronic Devices', NULL),
(2, 'lap-top', 1),
(3, 'smart-phones', 1),
(4, 'headphones', 1),
(5, 'desctops', 1),
(6, '16inc laptops', 2),
(7, '13inc laptops', 2),
(8, 'smart phones', 3),
(9, 'phablets', 3),
(10, 'wireless headphones', 4),
(11, 'wired headphones', 4),
(12, 'onear headphones', 10),
(13, 'large headphones', 10);
INSERT into products (
id, name, category_id
) VALUES
(1, 'mac-book-16 model 1', 6),
(2, 'mac-book-16 model 2', 6),
(3, 'mac-book-16 model 3', 6),
(4, 'mac-book-13 model 1', 7),
(5, 'mac-book-13 model 2', 7),
(6, 'mac-book-13 model 3', 7),
(7, 'mac-book-13 model 4', 7),
(8, 'iphone 10', 8),
(9, 'iphone 11', 8),
(10, 'iphone 12', 8),
(11, 'iphone 13', 8),
(12, 'iphone 14', 8),
(13, 'iphone 10 pro max', 9),
(14, 'iphone 11 pro max', 9),
(15, 'iphone 12 pro max', 9),
(16, 'iphone 13 pro max', 9),
(17, 'iphone 14 pro max', 9),
(18, 'galegxy note 8', 9),
(19, 'galegxy note 9', 9),
(20, 'galegxy note 10', 9),
(21, 'some headphones', 12),
(22, 'samsung galexypods', 12),
(23, 'apple earpods', 12),
(24, 'apple earpod pro', 13);
Let's say I want to get all the products related to "Electronic Devices" how I'm planning to do it like below:
WITH RECURSIVE products_in_category AS (
SELECT
p.*,
c.parent_id
FROM
products p
INNER join categories c on c.id = p.category_id
WHERE
category_id = 1
UNION
SELECT
p2.*,
c2.parent_id
FROM
products p2
INNER join categories c2 on c2.id = p2.category_id
INNER JOIN products_in_category s ON s.category_id = c2.parent_id
) SELECT
*
FROM
products_in_category limit 25;
I'm expecting to see all 24 rows in products, but I'm getting 0.
A: Is it possible with WITH RECURSIVE? , if it is how?
B: Is it a good way to do it this way(I do create indexes on my id fields)? is it scaleable (when we have 500 categories & 10k products for example)? if not what are the alternatives?
CodePudding user response:
To answer the indirect question of "why doesn't this work"...
First, you should only recurse that part of the structure that needs it; the categories in your case
- build the tree of categories first, inside the recursive CTE
- join the products to it outside of the CTE
Second, your expected results don't need a recursive query, just a join.
SELECT
p.*,
c.parent_id
FROM
products p
INNER join
categories c
ON c.id = p.category_id
To get all products within an arbitrary category (and its children categories)...
WITH RECURSIVE
recursed_category AS
(
SELECT
id AS root_category_id,
name AS root_category_name,
id AS category_id,
name AS category_name
FROM
categories
UNION ALL
SELECT
p.root_category_id,
p.root_category_name,
c.id,
c.name
FROM
recursed_category AS p -- parent
INNER JOIN
categories AS c -- child
ON c.parent_id = p.category_id
)
SELECT
c.*,
p.id AS product_id,
p.name AS product_name
FROM
recursed_category AS c -- category
INNER join
products AS p -- product
ON c.id = p.category_id
WHERE
c.root_category_id = ???
This builds multiple trees, one for each category, and travels to every child node of each category.
Then in the WHERE
clause you can specify which one you're actually interested in.
As SQL is Declarative, the optimiser willl apply the outer WHERE
clause to the CTE, and not build all the trees then throw some of them away.
Then, it's the simple query again; just take the category tree and join the products on, in the outer query.
NOTE: This expressly starts at a category and browses down. If you wanted to know which categories a product is in, do NOT use this. Make your recursive CTE similarly, but start at the children and browse up. Otherwise, the optimiser won't be able to push the where clause up to the CTE.
CodePudding user response:
In this example as you can see on my Git Repos also : https://github.com/WalterMatsuda/WITH-RECURSIVE-x-SELECT
The difference of the same operation made with a LOOP and subqueries is : In a 1000000 Rows tables with 5 levels of hierarchy
For Loops and subqueries
Query OK (execution time: 15.359 sec; total time: 15.422 sec)
For With Recursive
Query OK (execution time: 62 ms; total time: 125 ms)
Both returning 1667 values . You can try a more efficient looping logic than the most easier, but WITH RECURSIVE is made to use recursion. Loops in other hand when you dont need to access multiple level,only want to repeat something withouth going to another conditional repetition is more efficient than Recursion. So both should be used in their proper roles.