- I have this database for a CS/database-theory homework question for a hypothetical movie store company:
- For those who might be unfamiliar with the concept, a movie store is a retail location where patrons can film productions on VHS tape, or this newfangled format called "DVD".
- Who knows, maybe some time in the distant, far-off, future we might be able to view movies directly over the Internet?
- The DDL and sample data is below.
- For those who might be unfamiliar with the concept, a movie store is a retail location where patrons can film productions on VHS tape, or this newfangled format called "DVD".
- I need to write a query that will show all movies that are available in all three Chicago stores: (
WI01
,WI02
, andWI03
).- By looking at the raw data below ourselves, we can see that only these 3
movieId
values (D00001
,D00006
, andD00007
) havemovie_store
rows for everystore
located in Chicago.
- By looking at the raw data below ourselves, we can see that only these 3
CREATE TABLE movie (
movieId varchar(6) NOT NULL PRIMARY KEY,
title nvarchar(50) NOT NULL
);
CREATE TABLE store (
storeId varchar(4) NOT NULL PRIMARY KEY,
city nvarchar(20) NOT NULL
);
CREATE TABLE movie_store (
movieid varchar(6) FOREIGN KEY REFERENCES movie ( movieId ),
storeid varchar(4) FOREIGN KEY REFERENCES store ( storeId ),
PRIMARY KEY ( movieId, storeId )
);
GO
INSERT INTO movie ( movieId, title )
VALUES
('D00001', N'True Lies'),
('D00002', N'Predator'),
('D00003', N'Last Action Hero'),
('D00004', N'Red Heat'),
('D00005', N'Conan 1'),
('D00006', N'Conan 2'),
('D00007', N'Red Sonja');
INSERT INTO store ( storeId, city ) VALUES
('WI01', N'Chicago'),
('WI02', N'Chicago'),
('WI03', N'Chicago'),
('IL01', N'Atlanta'),
('IL02', N'Nashville');
INSERT INTO movie_store ( movieId, storeId ) VALUES
-- True Lies:
('D00001', 'WI01'),
('D00001', 'WI02'),
('D00001', 'WI03'),
-- 'Predator:
('D00002', 'IL01'),
('D00002', 'IL02'),
-- Last Action Hero:
('D00003', 'WI01'),
-- Red Heat:
('D00004', 'WI01'),
('D00004', 'WI02'),
('D00004', 'IL02'),
-- Conan 1:
('D00005', 'WI01'),
('D00005', 'WI02'),
-- Conan 2:
('D00006', 'WI01'),
('D00006', 'WI02'),
('D00006', 'WI03'),
-- Red Sonja:
('D00007', 'WI01'),
('D00007', 'WI02'),
('D00007', 'WI03');
- During my problem-solving research I found a site explaining the
ALL
operator.- My query is getting unique
storeIds
for Chicago. - It is then trying to get the movie
title
with astoreId
record for each one of the Chicago locations.
- My query is getting unique
WITH chicagoStores AS (
SELECT DISTINCT
storeId
FROM
store
WHERE
city = 'Chicago'
)
SELECT DISTINCT
m.title
FROM
movie AS m
INNER JOIN movie_store AS y ON m.movieid = y.movieid
INNER JOIN store AS s ON y.storeid = s.storeid
WHERE
s.storeId = ALL( SELECT storeId FROM chicagoStores )
But my query returns zero rows (and no errors), am I misunderstanding the ALL
operator?
CodePudding user response:
It appears I had the wrong idea about ALL. I realized I could write my query this way to get the movies that appeared in all Chicago locations. Thanks for your help everyone.
with stores as (
select count(distinct(storeid)) as count_store
from store
where city = 'Chicago'
),
count_movies as (
select z.title, count(*) as count
from movie z
join movie_store y on (z.movieid = y.movieid)
join store x on (y.storeid = x.storeid)
where x.city = 'Chicago'
group by z.title
having count(*) = (select count_store from stores)
)
select title from count_movies
CodePudding user response:
This can be done without a window and without mentioning chicago
2 times
select m.title,
s.city
from store s
inner join movie_store ms on s.storeid = ms.storeid
inner join movie m on ms.movieid = m.movieid
cross apply (select count(1) numbers from store s2 where s2.city = s.city group by s2.city) c
where s.city = 'Chicago'
group by m.title, s.city, c.numbers
having count(s.storeid) = c.numbers
Try it yourself at this DBFiddle
CodePudding user response:
After plugging-away for a bit more, and reading this article on Relational Division, I think I've found a much shorter query compared to my original answer:
SELECT
m.movieId,
m.title
FROM
movie AS m
WHERE
NOT EXISTS (
SELECT m.movieId, s.storeId FROM store AS s WHERE city = 'Chicago'
EXCEPT
SELECT r.movieId, r.storeId FROM movie_store AS r WHERE r.movieId = m.movieId
);
The explanation below is based on this slightly longer-form version which is otherwise identical in its relational-calculus:
WITH chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
)
SELECT
m.movieId,
m.title
FROM
movie AS m
WHERE
NOT EXISTS (
-- Generate rows we'd *expect to exist* if a given `m.movieId` is present in every Chicago storeId:
SELECT
m.movieId,
s.storeId
FROM
chicagoStores AS s
-- Then subtract the `movie_store` rows that *actually exist* for this m.movieId:
EXCEPT
SELECT
a.movieId,
a.storeId
FROM
movie_store AS a
WHERE
a.movieId = m.movieId
);
- Generate the same
chicagoStores
set. - Then, filter
movieId
values inmovie
(inWHERE
) by...- Generate a set of hypothetical
movieId, storeId
tuples (for that specificmovieId
value) fromchicagoStores
... - ...and subtract that from the actual rows in
movie_store
(usingEXCEPT
)...- This is the same thing as
whatIfEveryChicagoStoreHadEveryMovie
from my original answer, but computed on a a per-row basis, instead of (conceptually) generating the( movies X stores ) EXCEPT movie_store
Cartesian Product subtraction all-at-once.- ...with the implication that this might require less maximum total memory on the database-server, but that's entirely dependent on the execution-plan and DB engine.
- This is the same thing as
- ...and if there are any hypothetical rows remaining (using
EXISTS
) after theEXCEPT
then it means that themovieId
is not available in all Chicagoan stores. - So if we invert the predicate (
NOT EXISTS
) then that means we can test to see if a specificmovieId
value is available in all Chicagoan stores.- But be careful as it would also have false-posities when/if the
chicagoStores
collection was empty (asNOT EXISTS ( x EXCEPT y )
is alwaystrue
whenx
is empty, even ify
is also empty.- As a workaround, change
FROM movie
toFROM anyChicagoStoreMovieIds
(whereanyChicagoStoreMovieIds
is defined in my other answer).
- As a workaround, change
- But be careful as it would also have false-posities when/if the
- Generate a set of hypothetical
CodePudding user response:
The
x = ALL y
operator isn't what you want.- It sounds like you want an operator to test that "all values in x are also in y", aka an "
ALL IN
" operator.- Unfortunately
ALL IN
does not exist in SQL, despite its obvious utility.
- Unfortunately
- The
x = ALL y
operator actually tests to see if all values iny
(the right-hand-side single-column table-expression) equal the single scalar valuex
.- This functionality is not relevant to the problem as we don't need to test that some result-list equals some single row's column.
- The other operators aren't of much use either (e.g.
x != ALL y
orx < ALL y
).
- It sounds like you want an operator to test that "all values in x are also in y", aka an "
While there is no
ALL IN
operator in SQL, the concept exists in relational-algebra where it's known as relational division.- Conceptually (and if it existed in SQL) then
x DIVIDE y ON y.foo = x.foo
would by likeGROUP BY x.foo HAVING x.foo ALL IN ( SELECT foo FROM y ) UNGROUP
(of course,UNGROUP
isn't a thing either). - It's a common PITA to implement relational-division in SQL.
- Conceptually (and if it existed in SQL) then
From a high-level, the problem can be broken-down into 4 steps:
- Get the set of
storeId
values for Stores in Chicago.- i.e.
SELECT storeId FROM store WHERE city = 'Chicago'
- i.e.
- Separately, get the set of
movieId
values for all movies in those stores.- i.e.
SELECT * FROM movie_store AS ms INNER JOIN chicagoStores AS cs ON ms.storeId = cs.storeId
- i.e.
- Then group/partition the
chicagoMovies
set by their separatemovieId
values. - Then filter out those groups/partitions where each partition's set of
storeId
values does not equal thechicagoStores
set.
But here's the hard-part: SQL does not offer a way to evalaute a predicate condition for each partition in a GROUP BY
query.
SQL is more like Relational Calculus, where you describe what you want, as opposed to Relational Algebra, where you describe how you want it done. Linq is an example of a relational-algebra query language in comparison to SQL.
In Linq (for in-memory objects, not Entity Framework), you would do it like this:
HashSet<StoreId> chicagoStores = Stores
.Where( s => s.City == "Chicago" )
.Select( s => s.StoreId )
.ToHashSet();
MovieStores
.GroupBy( ms => ms.StoreId )
.Where( grp => grp.All( ms => chicagoStores.Contains( ms.StoreId ) ) )
.SelectMany( grp => grp )
.Select( ms => ms.Movie )
So I went for a completely different approach in SQL:
Get the set of
storeId
values for Stores in Chicago:WITH chicagoStores AS ( SELECT storeId FROM store WHERE city = 'Chicago' ) StoreId ------- WI01 WI02 WI03
Get the set of
movieId
values for movies that are in at least 1 Chicago store.WITH anyChicagoStoreMovieIds AS ( SELECT DISTINCT ms.movieId FROM movie_store AS ms INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId ) movieId ------- D00001 D00003 D00004 D00005 D00006 D00007
- Alternatively, the set of all
movieId
values inmovie
could be used, but doing that would make Step 3 potentially much slower.
- Alternatively, the set of all
Generate the
CROSS APPLY
of Step 1 and Step 2, which generates the Cartesian Product of every ChicagoanstoreId
with everymovieId
.- Hence why restricting it to a the smaller upper-bound set from Step 2 makes sense, as there's no point including
movieId
values that don't appear in any Chicago store.
WITH whatIfEveryChicagoStoreHadEveryMovie AS ( SELECT m.movieId, cs.movieId FROM chicagoStores AS cs CROSS APPLY anyChicagoStoreMovieIds AS m ) movieId storeId ------- D00001 WI01 D00001 WI02 D00001 WI03 D00003 WI01 D00003 WI02 D00003 WI03 D00004 WI01 D00004 WI02 D00004 WI03 D00005 WI01 D00005 WI02 D00005 WI03 D00006 WI01 D00006 WI02 D00006 WI03 D00007 WI01 D00007 WI02 D00007 WI03
- Hence why restricting it to a the smaller upper-bound set from Step 2 makes sense, as there's no point including
Now the hard part:
- Consider that if any given Chicago
storeId
had everymovieId
possible, then such a row would already-exist inmovie_store
... - ...therefore it follows that if we then subtract actual rows in
movie_store
from Step 3's result, then we'll be left with the set of( movieId, storeId )
tuples that don't exist but which would need to exist in order for every ChicagostoreId
to have that movie.
WITH chicagoanMoviesNotAvailableAtEveryChicagoanStore AS ( SELECT w.movieId, w.storeId FROM whatIfEveryChicagoStoreHadEveryMovie AS w LEFT OUTER JOIN movie_store AS ms ON w.movieId = ms.movieId AND w.storeId = ms.storeId WHERE ms.storeId IS NULL ) movieId storeId ---------------- D00003 WI02 D00003 WI03 D00004 WI03 D00005 WI03
- Consider that if any given Chicago
Then it's just a matter of subtracting
chicagoanMoviesNotAvailableAtEveryChicagoanStore
fromanyChicagoStoreMovieIds
(from Step 3), which gives us the set ofmovieId
values that are available at every ChicagoanstoreId
:WITH moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ( SELECT movieId FROM anyChicagoStoreMovieIds EXCEPT SELECT movieId FROM moviesNotInAtLeast1ChicagoStore ) movieId storeId ---------------- D00003 WI02 D00003 WI03 D00004 WI03 D00005 WI03
Which can then be
INNER JOIN
ed withmovie
to get theirtitle
information, etc:SELECT m.movieId, m.title FROM moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ffs INNER JOIN movie AS m ON ffs.movieId = m.movieId; movieId title ---------------- D00001 'True Lies' D00006 'Conan 2' D00007 'Red Sonja'
Thus giving the full final query:
WITH
chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
),
anyChicagoStoreMovieIds AS (
SELECT DISTINCT
ms.movieId
FROM
movie_store AS ms
INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId
),
expectedMovieStores AS (
SELECT
m.movieId,
cs.storeId
FROM
chicagoStores AS cs
CROSS JOIN anyChicagoStoreMovieIds AS m
),
moviesNotInAtLeast1ChicagoStore AS (
SELECT
e.*
FROM
expectedMovieStores AS e
LEFT OUTER JOIN movie_store AS ms ON
e.movieId = ms.movieId
AND
e.storeId = ms.storeId
WHERE
ms.storeId IS NULL
),
moviesNotIn_moviesNotInAtLeast1ChicagoStore AS (
SELECT
movieId
FROM
anyChicagoStoreMovieIds
EXCEPT
SELECT
movieId
FROM
moviesNotInAtLeast1ChicagoStore
)
SELECT
m.movieId,
m.title
FROM
moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ffs
INNER JOIN movie AS m ON
ffs.movieId = m.movieId;
The CTEs that are only used once can be inlined to shorten the query somewhat:
WITH
chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
),
anyChicagoStoreMovieIds AS (
SELECT DISTINCT
ms.movieId
FROM
movie_store AS ms
INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId
),
expectedMovieStores AS (
SELECT
m.movieId,
cs.storeId
FROM
chicagoStores AS cs
CROSS JOIN anyChicagoStoreMovieIds AS m
)
SELECT
m.movieId,
m.title
FROM
(
SELECT
movieId
FROM
anyChicagoStoreMovieIds
EXCEPT
SELECT
e.movieId
FROM
expectedMovieStores AS e
LEFT OUTER JOIN movie_store AS ms ON
e.movieId = ms.movieId
AND
e.storeId = ms.storeId
WHERE
ms.storeId IS NULL
) AS ffs
INNER JOIN movie AS m ON
ffs.movieId = m.movieId;
...which is stioll rather long and complex for what's described as a CS/SQL homework - this took me over 2 hours to figure out because it was driving me mad.
CodePudding user response:
Try this
select city,title,count(*)
from
store
inner join movie_store
on store.storeid = movie_store.storeid
inner join movie
on movie_store.movieid = movie.movieid
where city = 'Chicago'
group by city,title
having count(*) = 3