Home > database >  Find movies connected to all given tags
Find movies connected to all given tags

Time:10-09

I don't know how to phrase my SQL question so I will type out an expected scenario.

MovieID TagID
1 1
1 3
1 5
2 1
2 2
2 3
3 1
3 3
3 5
3 7

I want to select movieIds that have tag relationship with at least ALL these tags [1,3,5].
So the expected output would be 1,3.

CodePudding user response:

You can aggregate the tagids into an array and use the contains operator:

select movieid
from the_table
where tagid in (1,3,5)
group by movieid
having array_agg(tagid) @> array[1,3,5]

CodePudding user response:

A classical case of .

Here is an arsenal of query techniques:

The best solution depends on the complete situation. Postgres version, table size, cardinalities, indexes, performance requirements, result columns, ...

Typically among the fastest options:

SELECT movie_id
FROM   tbl t1
JOIN   tbl t2 USING (movie_id)
JOIN   tbl t3 USING (movie_id)
WHERE  t1.tag_id = 1
AND    t2.tag_id = 3
AND    t3.tag_id = 5;

Needs an index on (tag_id), or better yet on (tag_id, movie_id) to be fast.

If tbl implements a many-to-many relationship - see:

Then there should be a PRIMARY KEY or UNIQUE constraint on these two columns. The query at hand prefers (tag_id, movie_id) over (movie_id, tag_id). See:

If column statistics are up to date (see autovacuum, ANALYZE), chances are that Postgres will start with the most selective tag to eliminate non-qualifying movies early.

CodePudding user response:

I'm sure a_horse_with_no_name's answer works, and it's very clean.

Here's an approach that uses more vanilla sql.

WITH  vals (search_value) AS (VALUES (1),(3),(5)) 
    SELECT id
      FROM (
            SELECT DISTINCT
                   mv.id,
                   mv.TagID
              FROM movie mv
             INNER
              JOIN vals v
                ON mv.tagID = v.search_value
           )
     GROUP
        BY id
    HAVING COUNT(1) = (SELECT COUNT(*) FROM vals)

Edit: Just to call it out, since this type of question comes up for me every so often where instead of have a defined list of values I'm interested in the set of values from a column in another table. This approach can be used in that case without having to hardcode anything.

CodePudding user response:

With a slight modification to the query from @a_horse_with_no_name you can avoid double entering the target tags. Use a CTE to define the array then unnest forming the in clause. (See demo)

with tag_list (tags) as 
     (select  array [tag1,tag2,tag3,...]) 
select distinct on (mt.movie_id) movie_id::integer  
  from <table_name> mt
 where mt.tag_id in ( select unnest(tags) from tag_list) 
 group by mt.movie_id 
having array_agg(tag_id) @> (select tags from tag_list);

If running often you can wrap it in a SQL function thus essentially building a parameterized query:

create or replace 
function get_movies_with_all_tags( tags_in integer[]) 
  returns table (movie_id integer)
 language sql 
as $$
    select mt.movie_id 
      from <table_name> mt
     where mt.tag_id in ( select unnest(tags_in)) 
     group by mt.movie_id 
    having array_agg(tag_id) @> tags_in
     order by mt.movie_id;     
$$; 
  • Related