I have relational category & product tables. Categories are hierarchical. I will have queries based on category, for example
select *
from products
where CatId = 3
or
select *
from products
where CatId = 1
I have 6 level of category and 24 million row for products, I have to find fast and optimal solutions. My question is which structure is suitable.
I write some options, feel free to suggest a better alternative.
Current category table:
Id ParentId Name
---------------------
1 null CatA
2 null CatB
3 1 CatAa
4 2 CatBa
Product table option 1
Id Cat Name
------------------
1 3 Product_1
2 4 Product_2
Product table option 2
Id CatLevel1 CatLevel2 ... Name
-------------------------------------
1 1 3 . Product_1
2 2 4 . Product_2
Product table option 3
Id Cats Name
------------------
1 1:3 Product_1
2 2:4 Product_2
CodePudding user response:
Always keep option one, plus some denormalised tables (options two onwards) if you so desire. By keeping option one, you have the source truth to revert to or derive the others from.
Option two is only recommended if the searcher always knows what depth/level to search at. For example, if they know they need Level2=CATAb
then it works, but if they don't know CATAb
is at level two, they don't know which column to look in. It also relies on knowing how many levels to represent; if you can have a hundred levels, you need a hundred columns, and it's fragile of you need to add more depths. Generally, this doesn't apply and so is generally not a good optimisation.
Option three is a straight no. Never store multiple values in a one field (one column of one row). It will make Efficient searching of that column next to impossible.
The alternative to option three is to have a "link" table. Just two columns, category_id and product_id. Then you list all ancestors of a product, just on different rows.
category_id | product_id |
---|---|
1 | 1 |
3 | 1 |
2 | 2 |
4 | 2 |
These are all known as adjacency lists. A different model altogether is Nested Sets. I'm on my phone, and it's hard to describe without lots of formatting, but if you research online you'll find lots of information. They're much harder to comprehend and implement Initially, but very fast at retrieval when specifying a parent.
CodePudding user response:
Your product
table option 1 is fine and need no change
product_id
,
category_id
,
... other attributes
Your problem is in accessing the product based on the category hierarchy - which would make a need of a hierarchical query to get all categories in the tree below your selected category.
Instead of
select * from product where category_id = 1;
you'll need to write an additional hierarchical query to get the whole hierarchy tree
with cat_tree (id) as (
select id
from category where id = 1
UNION ALL
select ca.id
from cat_tree ct
join category ca
on ct.id = ca.parent_id
)
select * from product
where category_id in
(select id from cat_tree);
Which may not be practicable, but you may simplify it by denormalizing the category table
Let's assume your category data is such as
ID PARENT_ID
---------- ----------
1
3 1
5 3
6 3
The query below, which may be implemented as a MATERIALIZED VIEW that is refreshed on each category change pre-calculates all direct and indirect parent and child relations.
The result is
ID CHILD_ID
---------- ----------
1 1
1 3
1 5
1 6
3 3
3 5
3 6
5 5
6 6
E.g. for 1
you get itself, all its child's, their child's etc.
Using this category_denorm
object your query can be simplified to
select *
from product
where category_id in
(select child_id from category_denorm where id = 1);