Home > OS >  Returning rows that match a list of inputs
Returning rows that match a list of inputs

Time:01-09

I have a SQL Server database that stores recipes and in this database there is a view that returns

ingredient_name, ingredient_plural_name, recipe_id, recipe_name, recipe_description, tag_name 

(among other information not pertinent to this question).

The way the database is set up, this view will return a new row for each combination of ingredient and tag in the recipe (my test entry has 7 ingredients and 2 tags, so it returns 14 rows for this recipe) like this:

ingredient_name ingredient_plural_name tag_name etc.
all purpose flour NULL dessert The rest of the data is the same in all records
all purpose flour NULL simple
baking powder NULL dessert
baking powder NULL simple
egg eggs dessert
egg eggs simple
milk NULL dessert
milk NULL simple
unsalted butter NULL dessert
unsalted butter NULL simple
vanilla extract NULL dessert
vanilla extract NULL simple
white sugar NULL dessert
white sugar NULL simple

I want to write a stored procedure that returns the recipe_name, recipe_description, and recipe_id only if the tags and/or name/plural name contain all values provided as a parameter input.

The recipe can have more tags and ingredients than the ones specified, but it must contain all of the ones specified. For example, it should return this sample recipe if I do

EXEC search_include @includeTags= 'milk, eggs'

but NOT if I do

EXEC search_include @includeTags= 'milk, eggs, cheese'

Am I missing something, or is this going to be a bit convoluted?

I currently have this code:

PROCEDURE [dbo].[search_include]
    (@includeTags varchar(max))
AS
BEGIN
    SELECT DISTINCT 
        recipe_name, recipe_description, recipe_id
    FROM 
        recipe_view 
    WHERE 
        tag_name IN (SELECT VALUE FROM dbo.split(@includeTags))
        OR ingredient_name IN (SELECT VALUE FROM dbo.split(@includeTags))
        OR ingredient_plural_name IN (SELECT VALUE FROM dbo.split(@includeTags))
END

And it works to only show results where tags and ingredients are included in the list, but I need it to only return results of recipes that match ALL of the inputs.

I have also tried this approach:

BEGIN
    CREATE TABLE #tags (tag varchar(100))

    INSERT INTO #tags 
        SELECT VALUE FROM dbo.split(@includeTags)

    SELECT recipe_id
    FROM recipe_view 
    WHERE tag_name IN (SELECT tag FROM #tags)
       OR ingredient_name IN (SELECT tag FROM #tags)
       OR ingredient_plural_name IN (SELECT tag FROM #tags)
    GROUP BY recipe_id 
    HAVING COUNT(*) = (SELECT COUNT(*) FROM #tags)

    DROP TABLE #tags
END

From some similar (but not quite the same) questions from this site. This one gives me no results whatsoever, no matter what I type for the parameters.

The dbo.split function just takes a comma separated list as a varchar string and returns a pseudo table of each item in a new row in a "VALUE" column.

I'm not great with SQL by any means, and I'm having some difficulty wrapping my head around getting this right.

CodePudding user response:

I think the issue is with the view that creates multiple rows for the same ingredient. For example, if you had the two tags and just did a SELECT with the above e.g.,

SELECT *
FROM #recipe_view 
WHERE tag_name IN (SELECT tag FROM #tags)
    OR ingredient_name IN (SELECT tag FROM #tags)
    OR ingredient_plural_name IN (SELECT tag FROM #tags);

It will show 4 rows

ingredient_name ingredient_plural_name  tag_name    recipe_id
egg             eggs                    dessert     cake
egg             eggs                    simple      cake
milk            null                    dessert     cake
milk            null                    simple      cake

Therefore COUNT(*) will return 4 - and not match the HAVING clause (which is expecting a count of 2).

I suggest

  • Instead of having the 'recipe_view' look like that, instead you combine all relevant ingredients/tags etc into a single column
  • Do a matching similar to your answer above.

Here is a db<>fiddle that starts with the recipe_view, converts it to a single column to check (recipe_tag) then does the check.

I expect you have a better way of doing this (given that the view is already selecting data from elsewhere) but the 'view' I use (actually a temporary table called recipe_tags) has the following data/structure - with all the tags/etc relevant to the recipe.

recipe_id   recipe_tag
cake        all purpose flour
cake        baking powder
cake        dessert
cake        egg
cake        eggs
cake        milk
cake        simple
cake        unsalted butter
cake        vanilla extract
cake        white sugar

Then you can do a similar approach as you have above against this table

SELECT recipe_id
FROM #recipe_tags
   INNER JOIN #tags ON #recipe_tags.recipe_tag = #tags.tag
GROUP BY recipe_id 
HAVING COUNT(*) = (SELECT COUNT(*) FROM #tags);

This has the expected output - the recipe_id (in this case 'cake').

CodePudding user response:

What you want is a query that: Given a distinct list of recipes, select each recipe where all tags are present in the ingredients for that recipe.

The "tags are present in the ingredients" is logically equivalent to "where none of the tags are missing from the ingredients". This can be achieved with a nested WHERE NOT EXISTS() condition.

Something like:

SELECT R.*
FROM (
    SELECT DISTINCT recipe_name, recipe_description, recipe_id
    FROM recipe_view R
) R
WHERE NOT EXISTS(
    SELECT *
    FROM #tags T
    WHERE NOT EXISTS(
        SELECT *
        FROM recipe_view I
        WHERE I.recipe_id = R.recipe_id
        AND (
            T.tag = I.tag_name
            OR T.tag = I.ingredient_name 
            OR T.tag = I.ingredient_plural_name
        )
    )
)

See this db<>fiddle.

Using the view repeatedly in this query can be inefficient. If you hae access to the underlying tables, it may be wise to rewrite the above to access those tables directly.

CodePudding user response:

One option to achieve this functionality is to union all of the three columns that are required to match with, then use the distinct count to check if the three columns have all of the tag values.

CREATE PROCEDURE search_include
  @includeTags varchar(128)
AS
BEGIN
  CREATE TABLE #tags (tag varchar(100));
  INSERT INTO #tags 
        SELECT VALUE FROM STRING_SPLIT(REPLACE(@includeTags, ', ', ','), ',');

  with rv as 
  (
    SELECT recipe_id, ingredient_name as attribute_value
    FROM recipe_view WHERE ingredient_name IN (SELECT tag FROM #tags)
    UNION ALL
    SELECT recipe_id, ingredient_plural_name
    FROM recipe_view WHERE ingredient_plural_name IN (SELECT tag FROM #tags)
    UNION ALL
    SELECT recipe_id, tag_name
    FROM recipe_view WHERE tag_name IN (SELECT tag FROM #tags)
  )
  SELECT recipe_id
  FROM rv 
  GROUP BY recipe_id 
  HAVING COUNT(distinct attribute_value) = (SELECT COUNT(*) FROM #tags);
  
  DROP TABLE #tags;
END

The use of REPLACE('milk, dessert', ', ',',') inside the split function is to remove the space that may exist in the parameter value, i.e.'milk, eggs' -> 'milk,eggs'.

see demo

  • Related