Find movies connected to all given tags-CodePudding

I don't know how to phrase my SQL question so I will type out an expected scenario.

MovieID	TagID
1	1
1	3
1	5
2	1
2	2
2	3
3	1
3	3
3	5
3	7

I want to select movieIds that have tag relationship with at least ALL these tags [1,3,5].
So the expected output would be 1,3.

CodePudding user response：

You can aggregate the tagids into an array and use the contains operator:

select movieid
from the_table
where tagid in (1,3,5)
group by movieid
having array_agg(tagid) @> array[1,3,5]

CodePudding user response：

A classical case of relational-division.

Here is an arsenal of query techniques:

How to filter SQL results in a has-many-through relation

The best solution depends on the complete situation. Postgres version, table size, cardinalities, indexes, performance requirements, result columns, ...

Typically among the fastest options:

SELECT movie_id
FROM   tbl t1
JOIN   tbl t2 USING (movie_id)
JOIN   tbl t3 USING (movie_id)
WHERE  t1.tag_id = 1
AND    t2.tag_id = 3
AND    t3.tag_id = 5;

Needs an index on (tag_id), or better yet on (tag_id, movie_id) to be fast.

If tbl implements a many-to-many relationship - see:

How to implement a many-to-many relationship in PostgreSQL?

Then there should be a PRIMARY KEY or UNIQUE constraint on these two columns. The query at hand prefers (tag_id, movie_id) over (movie_id, tag_id). See:

Is a composite index also good for queries on the first field?

If column statistics are up to date (see autovacuum, ANALYZE), chances are that Postgres will start with the most selective tag to eliminate non-qualifying movies early.

CodePudding user response：

I'm sure a_horse_with_no_name's answer works, and it's very clean.

Here's an approach that uses more vanilla sql.

WITH  vals (search_value) AS (VALUES (1),(3),(5)) 
    SELECT id
      FROM (
            SELECT DISTINCT
                   mv.id,
                   mv.TagID
              FROM movie mv
             INNER
              JOIN vals v
                ON mv.tagID = v.search_value
           )
     GROUP
        BY id
    HAVING COUNT(1) = (SELECT COUNT(*) FROM vals)

Edit: Just to call it out, since this type of question comes up for me every so often where instead of have a defined list of values I'm interested in the set of values from a column in another table. This approach can be used in that case without having to hardcode anything.

CodePudding user response：

With a slight modification to the query from @a_horse_with_no_name you can avoid double entering the target tags. Use a CTE to define the array then unnest forming the in clause. (See demo)

with tag_list (tags) as 
     (select  array [tag1,tag2,tag3,...]) 
select distinct on (mt.movie_id) movie_id::integer  
  from <table_name> mt
 where mt.tag_id in ( select unnest(tags) from tag_list) 
 group by mt.movie_id 
having array_agg(tag_id) @> (select tags from tag_list);

If running often you can wrap it in a SQL function thus essentially building a parameterized query:

create or replace 
function get_movies_with_all_tags( tags_in integer[]) 
  returns table (movie_id integer)
 language sql 
as $$
    select mt.movie_id 
      from <table_name> mt
     where mt.tag_id in ( select unnest(tags_in)) 
     group by mt.movie_id 
    having array_agg(tag_id) @> tags_in
     order by mt.movie_id;     
$$;