In my case an entity is a recipe and a tag an ingredient. I want to search by ingredient and return all recipes containing the ingredient and return all other ingredients as well. In my case ingredients can have synonyms.
First, here are my tables:
CREATE TABLE `ingredient` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
`synonym_group` int DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE `recipe` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`type` varchar(45) DEFAULT NULL,
`comment` varchar(200) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE `recipe_ingredient` (
`recipe_id` int NOT NULL,
`ingredient_id` int NOT NULL,
`alternative_ingredient_id` int DEFAULT NULL,
`is_optional` tinyint(1) DEFAULT '0',
PRIMARY KEY (`recipe_id`,`ingredient_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Here is the query I've come up with:
SELECT
r.name,
i_all.name
FROM
recipe r
JOIN (
SELECT DISTINCT
recipe_id
FROM
ingredient i
JOIN recipe_ingredient ri ON i.id = ri.ingredient_id
LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group
WHERE i.name = 'Asd' OR i_syn.name = 'Asd'
) i ON i.recipe_id = r.id
JOIN recipe_ingredient ri ON i.recipe_id = ri.recipe_id
JOIN ingredient i_all ON ri.ingredient_id = i_all.id
I feel like this query is very complicated for what I want to do and there is a much simpler way. Mainly it feels weird that I have to specify the search term twice in the query.
Right now, if I only filter by i.name = 'Asd'
I won't find any synonyms and if I only filter by i_syn.name = 'Asd'
I won't find anything if the search term has no synonyms. Removing the distinct
doesn't seem to change anything.
Is there just no better way or am I thinking too complicated?
Example data for ingredients:
id | name | synonym_group |
---|---|---|
1 | Bell Pepper | 1 |
2 | Capsicum | 1 |
3 | Egg | NULL |
4 | Sugar | NULL |
CodePudding user response:
I would establish all the ingredients in groups and not in groups where not in groups simplify by setting a group to ingredient id then join to recipe to get distinct recipe then the tables can be simply joined. This way the code doesn't care which ingredient or member of group you search on.
drop table if exists ingredient,recipe,recipe_ingredient;
CREATE TABLE `ingredient` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
`synonym_group` int DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name_UNIQUE` (`name`)
) ;
insert into ingredient (id,name ,synonym_group) values
(1,'egg',null),(2,'bell pepper',10),(3,'capsicum',10),(4,'sugar',null),(5,'chilli pepper',null);
CREATE TABLE `recipe` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`type` varchar(45) DEFAULT NULL,
`comment` varchar(200) DEFAULT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `recipe_ingredient` (
`recipe_id` int NOT NULL,
`ingredient_id` int NOT NULL,
`alternative_ingredient_id` int DEFAULT NULL,
`is_optional` tinyint(1) DEFAULT '0',
PRIMARY KEY (`recipe_id`,`ingredient_id`)
) ;
insert into recipe_ingredient values
(1,1,null,null),(2,2,null,null),(2,5,null,null),(2,4,null,null),(3,3,null,null);
insert into recipe (id, name) values
(1,'stew'),(2,'nei'),(3,'three');
select r.id,r.name,ri.recipe_id,ri.ingredient_id,i.name
from recipe_ingredient ri
join recipe r on r.id = ri.recipe_id
join ingredient i on i.id = ri.ingredient_id
join
(
select distinct ri.recipe_id rid
from recipe_ingredient ri
join
(
select i.id ,sg
from ingredient i
join
(
Select case
when i.synonym_group is not null then i.synonym_group
else i.id
end as sg
from ingredient i where i.name = 'capsicum'
) t on
t.sg = i.synonym_group or
t.sg = i.id
) u on u.id = ri.ingredient_id
) v on v.rid = ri.recipe_id;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=879979bde1118b6e8fb1acf6d090204c
CodePudding user response:
You say that -
Removing the
distinct
doesn't seem to change anything.
which suggests your test data is inadequate. The following illustrates the importance of DISTINCT
in your current query. Given your ingredients listed above and the below recipes -
id | name | type | comment |
---|---|---|---|
1 | recipe 1 | type | comment |
2 | recipe 2 | type | comment |
3 | recipe 3 | type | comment |
4 | recipe 4 | type | comment |
5 | recipe 5 | type | comment |
recipe_id | ingredient_id | alternative_ingredient_id | is_optional |
---|---|---|---|
1 | 1 | NULL | 0 |
1 | 3 | NULL | 0 |
2 | 1 | NULL | 0 |
2 | 4 | NULL | 0 |
3 | 2 | NULL | 0 |
3 | 3 | NULL | 0 |
3 | 4 | NULL | 0 |
4 | 2 | NULL | 0 |
4 | 4 | NULL | 0 |
5 | 1 | NULL | 0 |
5 | 3 | NULL | 0 |
5 | 4 | NULL | 0 |
Your inner query -
SELECT DISTINCT
recipe_id
FROM
ingredient i
JOIN recipe_ingredient ri ON i.id = ri.ingredient_id
LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group
WHERE i.name = 'Bell Pepper' OR i_syn.name = 'Bell Pepper';
returns -
recipe_id |
---|
1 |
2 |
3 |
4 |
5 |
but if you drop the DISTINCT
it returns -
recipe_id |
---|
1 |
1 |
2 |
2 |
3 |
4 |
5 |
5 |
If you drop DISTINCT
but add an additional join predicate to stop it LEFT JOINing to itself you get the desired result -
SELECT
recipe_id
FROM
ingredient i
JOIN recipe_ingredient ri ON i.id = ri.ingredient_id
LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group AND i.id <> i_syn.id
WHERE i.name = 'Bell Pepper' OR i_syn.name = 'Bell Pepper';
Your current query appears to be optimal and it can be easily modified to allow for searches based on multiple ingredients with a best match type ranking by just modifying the inner query -
SELECT
recipe_id, COUNT(ingredient_id) AS ingredients_matched
FROM
ingredient i
JOIN recipe_ingredient ri ON i.id = ri.ingredient_id
LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group AND i.id <> i_syn.id
WHERE i.name IN ('Bell Pepper', 'Sugar', 'Egg') OR i_syn.name IN ('Bell Pepper', 'Sugar', 'Egg')
GROUP BY recipe_id