Search entity by tag and return all other tags as well in MySQL-CodePudding

In my case an entity is a recipe and a tag an ingredient. I want to search by ingredient and return all recipes containing the ingredient and return all other ingredients as well. In my case ingredients can have synonyms.

First, here are my tables:

CREATE TABLE `ingredient` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  `synonym_group` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `recipe` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `type` varchar(45) DEFAULT NULL,
  `comment` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `recipe_ingredient` (
  `recipe_id` int NOT NULL,
  `ingredient_id` int NOT NULL,
  `alternative_ingredient_id` int DEFAULT NULL,
  `is_optional` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`recipe_id`,`ingredient_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Here is the query I've come up with:

SELECT 
  r.name, 
  i_all.name

FROM
  recipe r
  
JOIN (
    SELECT DISTINCT 
        recipe_id 
    FROM 
        ingredient i 
        JOIN recipe_ingredient ri ON i.id = ri.ingredient_id 
        LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group 
    WHERE i.name = 'Asd' OR i_syn.name = 'Asd'
) i ON i.recipe_id = r.id 

JOIN recipe_ingredient ri ON i.recipe_id = ri.recipe_id 
JOIN ingredient i_all ON ri.ingredient_id = i_all.id

I feel like this query is very complicated for what I want to do and there is a much simpler way. Mainly it feels weird that I have to specify the search term twice in the query. Right now, if I only filter by i.name = 'Asd' I won't find any synonyms and if I only filter by i_syn.name = 'Asd' I won't find anything if the search term has no synonyms. Removing the distinct doesn't seem to change anything.

Is there just no better way or am I thinking too complicated?

Example data for ingredients:

id	name	synonym_group
1	Bell Pepper	1
2	Capsicum	1
3	Egg	NULL
4	Sugar	NULL

CodePudding user response：

I would establish all the ingredients in groups and not in groups where not in groups simplify by setting a group to ingredient id then join to recipe to get distinct recipe then the tables can be simply joined. This way the code doesn't care which ingredient or member of group you search on.

drop table if exists ingredient,recipe,recipe_ingredient;
CREATE TABLE `ingredient` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  `synonym_group` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name_UNIQUE` (`name`)
) ;
insert into ingredient (id,name ,synonym_group) values
(1,'egg',null),(2,'bell pepper',10),(3,'capsicum',10),(4,'sugar',null),(5,'chilli pepper',null);

CREATE TABLE `recipe` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `type` varchar(45) DEFAULT NULL,
  `comment` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE `recipe_ingredient` (
  `recipe_id` int NOT NULL,
  `ingredient_id` int NOT NULL,
  `alternative_ingredient_id` int DEFAULT NULL,
  `is_optional` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`recipe_id`,`ingredient_id`)
) ;

insert into recipe_ingredient values
(1,1,null,null),(2,2,null,null),(2,5,null,null),(2,4,null,null),(3,3,null,null);

insert into recipe (id, name) values
(1,'stew'),(2,'nei'),(3,'three');

select r.id,r.name,ri.recipe_id,ri.ingredient_id,i.name
from recipe_ingredient ri
join recipe r on r.id = ri.recipe_id
join ingredient i on i.id = ri.ingredient_id
join
(
select distinct ri.recipe_id rid
from recipe_ingredient ri
join
    (
    select i.id ,sg
    from ingredient i 
    join
    (
    Select case 
        when i.synonym_group is not null then i.synonym_group 
        else i.id
        end as sg
    from ingredient i where i.name = 'capsicum' 
    ) t on 
    t.sg = i.synonym_group or 
    t.sg = i.id
) u on u.id = ri.ingredient_id
) v on v.rid = ri.recipe_id;

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=879979bde1118b6e8fb1acf6d090204c

CodePudding user response：

You say that -

Removing the distinct doesn't seem to change anything.

which suggests your test data is inadequate. The following illustrates the importance of DISTINCT in your current query. Given your ingredients listed above and the below recipes -

id	name	type	comment
1	recipe 1	type	comment
2	recipe 2	type	comment
3	recipe 3	type	comment
4	recipe 4	type	comment
5	recipe 5	type	comment

recipe_id	ingredient_id	alternative_ingredient_id
1	1	NULL
1	3	NULL
2	1	NULL
2	4	NULL
3	2	NULL
3	3	NULL
3	4	NULL
4	2	NULL
4	4	NULL
5	1	NULL
5	3	NULL
5	4	NULL

Your inner query -

SELECT DISTINCT 
    recipe_id 
FROM 
    ingredient i 
    JOIN recipe_ingredient ri ON i.id = ri.ingredient_id 
    LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group
WHERE i.name = 'Bell Pepper' OR i_syn.name = 'Bell Pepper';

returns -

recipe_id
1
2
3
4
5

but if you drop the DISTINCT it returns -

recipe_id
1
1
2
2
3
4
5
5

If you drop DISTINCT but add an additional join predicate to stop it LEFT JOINing to itself you get the desired result -

SELECT
    recipe_id 
FROM 
    ingredient i 
    JOIN recipe_ingredient ri ON i.id = ri.ingredient_id 
    LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group AND i.id <> i_syn.id
WHERE i.name = 'Bell Pepper' OR i_syn.name = 'Bell Pepper';

Your current query appears to be optimal and it can be easily modified to allow for searches based on multiple ingredients with a best match type ranking by just modifying the inner query -

SELECT 
    recipe_id, COUNT(ingredient_id) AS ingredients_matched
FROM 
    ingredient i 
    JOIN recipe_ingredient ri ON i.id = ri.ingredient_id 
    LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group AND i.id <> i_syn.id
WHERE i.name IN ('Bell Pepper', 'Sugar', 'Egg') OR i_syn.name IN ('Bell Pepper', 'Sugar', 'Egg')
GROUP BY recipe_id