I don't know how to phrase my SQL question so I will type out an expected scenario.
MovieID | TagID |
---|---|
1 | 1 |
1 | 3 |
1 | 5 |
2 | 1 |
2 | 2 |
2 | 3 |
3 | 1 |
3 | 3 |
3 | 5 |
3 | 7 |
I want to select movieIds that have tag relationship with at least ALL these tags [1,3,5]
.
So the expected output would be 1,3
.
CodePudding user response:
You can aggregate the tagids into an array and use the contains operator:
select movieid
from the_table
where tagid in (1,3,5)
group by movieid
having array_agg(tagid) @> array[1,3,5]
CodePudding user response:
A classical case of relational-division.
Here is an arsenal of query techniques:
The best solution depends on the complete situation. Postgres version, table size, cardinalities, indexes, performance requirements, result columns, ...
Typically among the fastest options:
SELECT movie_id
FROM tbl t1
JOIN tbl t2 USING (movie_id)
JOIN tbl t3 USING (movie_id)
WHERE t1.tag_id = 1
AND t2.tag_id = 3
AND t3.tag_id = 5;
Needs an index on (tag_id)
, or better yet on (tag_id, movie_id)
to be fast.
If tbl
implements a many-to-many relationship - see:
Then there should be a PRIMARY KEY
or UNIQUE
constraint on these two columns. The query at hand prefers (tag_id, movie_id)
over (movie_id, tag_id)
. See:
If column statistics are up to date (see autovacuum
, ANALYZE
), chances are that Postgres will start with the most selective tag to eliminate non-qualifying movies early.
CodePudding user response:
I'm sure a_horse_with_no_name's answer works, and it's very clean.
Here's an approach that uses more vanilla sql.
WITH vals (search_value) AS (VALUES (1),(3),(5))
SELECT id
FROM (
SELECT DISTINCT
mv.id,
mv.TagID
FROM movie mv
INNER
JOIN vals v
ON mv.tagID = v.search_value
)
GROUP
BY id
HAVING COUNT(1) = (SELECT COUNT(*) FROM vals)
Edit: Just to call it out, since this type of question comes up for me every so often where instead of have a defined list of values I'm interested in the set of values from a column in another table. This approach can be used in that case without having to hardcode anything.
CodePudding user response:
With a slight modification to the query from @a_horse_with_no_name you can avoid double entering the target tags. Use a CTE to define the array then unnest
forming the in
clause. (See demo)
with tag_list (tags) as
(select array [tag1,tag2,tag3,...])
select distinct on (mt.movie_id) movie_id::integer
from <table_name> mt
where mt.tag_id in ( select unnest(tags) from tag_list)
group by mt.movie_id
having array_agg(tag_id) @> (select tags from tag_list);
If running often you can wrap it in a SQL function thus essentially building a parameterized query:
create or replace
function get_movies_with_all_tags( tags_in integer[])
returns table (movie_id integer)
language sql
as $$
select mt.movie_id
from <table_name> mt
where mt.tag_id in ( select unnest(tags_in))
group by mt.movie_id
having array_agg(tag_id) @> tags_in
order by mt.movie_id;
$$;