Home > Mobile >  Can I use the ALL operator to test that all values of a "group" exist in another query/set
Can I use the ALL operator to test that all values of a "group" exist in another query/set

Time:03-11

  • I have this database for a CS/database-theory homework question for a hypothetical movie store company:
    • For those who might be unfamiliar with the concept, a movie store is a retail location where patrons can film productions on VHS tape, or this newfangled format called "DVD".
      • Who knows, maybe some time in the distant, far-off, future we might be able to view movies directly over the Internet?
    • The DDL and sample data is below.
  • I need to write a query that will show all movies that are available in all three Chicago stores: (WI01, WI02, and WI03).
    • By looking at the raw data below ourselves, we can see that only these 3 movieId values (D00001, D00006, and D00007) have movie_store rows for every store located in Chicago.
CREATE TABLE movie (
    movieId varchar(6)   NOT NULL PRIMARY KEY, 
    title   nvarchar(50) NOT NULL
);

CREATE TABLE store (
    storeId varchar(4)   NOT NULL PRIMARY KEY, 
    city    nvarchar(20) NOT NULL
);

CREATE TABLE movie_store (
    movieid varchar(6) FOREIGN KEY REFERENCES movie ( movieId ),
    storeid varchar(4) FOREIGN KEY REFERENCES store ( storeId ),

    PRIMARY KEY ( movieId, storeId )
);

GO

INSERT INTO movie ( movieId, title )
VALUES
('D00001', N'True Lies'),
('D00002', N'Predator'),
('D00003', N'Last Action Hero'),
('D00004', N'Red Heat'),
('D00005', N'Conan 1'),
('D00006', N'Conan 2'),
('D00007', N'Red Sonja');

INSERT INTO store ( storeId, city ) VALUES
('WI01', N'Chicago'),
('WI02', N'Chicago'),
('WI03', N'Chicago'),
('IL01', N'Atlanta'),
('IL02', N'Nashville');

INSERT INTO movie_store ( movieId, storeId ) VALUES
-- True Lies:
('D00001', 'WI01'),
('D00001', 'WI02'),
('D00001', 'WI03'),
-- 'Predator:
('D00002', 'IL01'),
('D00002', 'IL02'),
-- Last Action Hero:
('D00003', 'WI01'),
-- Red Heat:
('D00004', 'WI01'),
('D00004', 'WI02'),
('D00004', 'IL02'),
-- Conan 1:
('D00005', 'WI01'),
('D00005', 'WI02'),
-- Conan 2:
('D00006', 'WI01'),
('D00006', 'WI02'),
('D00006', 'WI03'),
-- Red Sonja:
('D00007', 'WI01'),
('D00007', 'WI02'),
('D00007', 'WI03'); 
  • During my problem-solving research I found a site explaining the ALL operator.
    • My query is getting unique storeIds for Chicago.
    • It is then trying to get the movie title with a storeId record for each one of the Chicago locations.
WITH chicagoStores AS (
    SELECT DISTINCT
        storeId
    FROM
        store 
    WHERE
        city = 'Chicago'
)
SELECT DISTINCT
    m.title
FROM
    movie AS m
    INNER JOIN movie_store AS y ON m.movieid = y.movieid
    INNER JOIN store       AS s ON y.storeid = s.storeid
WHERE
    s.storeId = ALL( SELECT storeId FROM chicagoStores ) 

But my query returns zero rows (and no errors), am I misunderstanding the ALL operator?

CodePudding user response:

It appears I had the wrong idea about ALL. I realized I could write my query this way to get the movies that appeared in all Chicago locations. Thanks for your help everyone.

with stores as (
    select count(distinct(storeid)) as count_store
    from store 
    where city = 'Chicago'
),
     count_movies as (
    select z.title, count(*) as count
    from movie z
    join movie_store y on (z.movieid = y.movieid)
    join store x on (y.storeid = x.storeid)
    where x.city = 'Chicago'
    group by z.title
    having count(*) = (select count_store from stores)
)
select title from count_movies

CodePudding user response:

This can be done without a window and without mentioning chicago 2 times

select m.title,
       s.city
from   store s
  inner join movie_store ms on s.storeid = ms.storeid
  inner join movie m on ms.movieid = m.movieid
  cross apply (select count(1) numbers from store s2 where s2.city = s.city group by s2.city) c
where  s.city = 'Chicago'
group by m.title, s.city, c.numbers
having  count(s.storeid) = c.numbers

Try it yourself at this DBFiddle

CodePudding user response:

After plugging-away for a bit more, and reading this article on Relational Division, I think I've found a much shorter query compared to my original answer:

SELECT
    m.movieId,
    m.title
FROM
    movie AS m
WHERE
    NOT EXISTS (
        SELECT m.movieId, s.storeId FROM store       AS s WHERE city = 'Chicago'
        EXCEPT
        SELECT r.movieId, r.storeId FROM movie_store AS r WHERE r.movieId = m.movieId
    );

The explanation below is based on this slightly longer-form version which is otherwise identical in its relational-calculus:

WITH chicagoStores AS (
    SELECT storeId FROM store WHERE city = 'Chicago'
)
SELECT
    m.movieId,
    m.title
FROM
    movie AS m
WHERE
    NOT EXISTS (
        -- Generate rows we'd *expect to exist* if a given `m.movieId` is present in every Chicago storeId:
        SELECT
            m.movieId,
            s.storeId
        FROM
            chicagoStores AS s

        -- Then subtract the `movie_store` rows that *actually exist* for this m.movieId:
        EXCEPT

        SELECT
            a.movieId,
            a.storeId
        FROM
            movie_store AS a
        WHERE
            a.movieId = m.movieId
    );
  • Generate the same chicagoStores set.
  • Then, filter movieId values in movie (in WHERE) by...
    • Generate a set of hypothetical movieId, storeId tuples (for that specific movieId value) from chicagoStores...
    • ...and subtract that from the actual rows in movie_store (using EXCEPT)...
      • This is the same thing as whatIfEveryChicagoStoreHadEveryMovie from my original answer, but computed on a a per-row basis, instead of (conceptually) generating the ( movies X stores ) EXCEPT movie_store Cartesian Product subtraction all-at-once.
        • ...with the implication that this might require less maximum total memory on the database-server, but that's entirely dependent on the execution-plan and DB engine.
    • ...and if there are any hypothetical rows remaining (using EXISTS) after the EXCEPT then it means that the movieId is not available in all Chicagoan stores.
    • So if we invert the predicate (NOT EXISTS) then that means we can test to see if a specific movieId value is available in all Chicagoan stores.
      • But be careful as it would also have false-posities when/if the chicagoStores collection was empty (as NOT EXISTS ( x EXCEPT y ) is always true when x is empty, even if y is also empty.
        • As a workaround, change FROM movie to FROM anyChicagoStoreMovieIds (where anyChicagoStoreMovieIds is defined in my other answer).

CodePudding user response:

  • The x = ALL y operator isn't what you want.

    • It sounds like you want an operator to test that "all values in x are also in y", aka an "ALL IN" operator.
    • The x = ALL y operator actually tests to see if all values in y (the right-hand-side single-column table-expression) equal the single scalar value x.
      • This functionality is not relevant to the problem as we don't need to test that some result-list equals some single row's column.
      • The other operators aren't of much use either (e.g. x != ALL y or x < ALL y).
  • While there is no ALL IN operator in SQL, the concept exists in relational-algebra where it's known as relational division.


From a high-level, the problem can be broken-down into 4 steps:

  1. Get the set of storeId values for Stores in Chicago.
    • i.e. SELECT storeId FROM store WHERE city = 'Chicago'
  2. Separately, get the set of movieId values for all movies in those stores.
    • i.e. SELECT * FROM movie_store AS ms INNER JOIN chicagoStores AS cs ON ms.storeId = cs.storeId
  3. Then group/partition the chicagoMovies set by their separate movieId values.
  4. Then filter out those groups/partitions where each partition's set of storeId values does not equal the chicagoStores set.

But here's the hard-part: SQL does not offer a way to evalaute a predicate condition for each partition in a GROUP BY query.

SQL is more like Relational Calculus, where you describe what you want, as opposed to Relational Algebra, where you describe how you want it done. Linq is an example of a relational-algebra query language in comparison to SQL.

In Linq (for in-memory objects, not Entity Framework), you would do it like this:

HashSet<StoreId> chicagoStores = Stores
    .Where( s => s.City == "Chicago" )
    .Select( s => s.StoreId )
    .ToHashSet();

MovieStores
    .GroupBy( ms => ms.StoreId )
    .Where( grp => grp.All( ms => chicagoStores.Contains( ms.StoreId ) ) )
    .SelectMany( grp => grp )
    .Select( ms => ms.Movie )

So I went for a completely different approach in SQL:

  1. Get the set of storeId values for Stores in Chicago:

    WITH chicagoStores AS (
        SELECT
            storeId
        FROM
            store
         WHERE
            city = 'Chicago'
    )
    
    StoreId
    -------
    WI01
    WI02
    WI03
    
  2. Get the set of movieId values for movies that are in at least 1 Chicago store.

    WITH anyChicagoStoreMovieIds AS (
        SELECT DISTINCT
            ms.movieId
        FROM
            movie_store AS ms
            INNER JOIN chicagoStores AS cs ON
                cs.storeId = ms.storeId
    )
    
    movieId
    -------
    D00001
    D00003
    D00004
    D00005
    D00006
    D00007
    
    • Alternatively, the set of all movieId values in movie could be used, but doing that would make Step 3 potentially much slower.
  3. Generate the CROSS APPLY of Step 1 and Step 2, which generates the Cartesian Product of every Chicagoan storeId with every movieId.

    • Hence why restricting it to a the smaller upper-bound set from Step 2 makes sense, as there's no point including movieId values that don't appear in any Chicago store.
    WITH whatIfEveryChicagoStoreHadEveryMovie AS (
        SELECT
            m.movieId,
            cs.movieId
        FROM
            chicagoStores AS cs
            CROSS APPLY anyChicagoStoreMovieIds AS m
    )
    
    movieId  storeId
    -------
    D00001   WI01
    D00001   WI02
    D00001   WI03
    D00003   WI01
    D00003   WI02
    D00003   WI03
    D00004   WI01
    D00004   WI02
    D00004   WI03
    D00005   WI01
    D00005   WI02
    D00005   WI03
    D00006   WI01
    D00006   WI02
    D00006   WI03
    D00007   WI01
    D00007   WI02
    D00007   WI03
    
  4. Now the hard part:

    • Consider that if any given Chicago storeId had every movieId possible, then such a row would already-exist in movie_store...
    • ...therefore it follows that if we then subtract actual rows in movie_store from Step 3's result, then we'll be left with the set of ( movieId, storeId ) tuples that don't exist but which would need to exist in order for every Chicago storeId to have that movie.
    WITH chicagoanMoviesNotAvailableAtEveryChicagoanStore AS (
        SELECT
            w.movieId,
            w.storeId
        FROM
            whatIfEveryChicagoStoreHadEveryMovie AS w
            LEFT OUTER JOIN movie_store AS ms ON
                w.movieId = ms.movieId
                AND
                w.storeId = ms.storeId
        WHERE
            ms.storeId IS NULL
    )
    
    movieId  storeId
    ----------------
    D00003   WI02
    D00003   WI03
    D00004   WI03
    D00005   WI03
    
  5. Then it's just a matter of subtracting chicagoanMoviesNotAvailableAtEveryChicagoanStore from anyChicagoStoreMovieIds (from Step 3), which gives us the set of movieId values that are available at every Chicagoan storeId:

    WITH moviesNotIn_moviesNotInAtLeast1ChicagoStore AS (
        SELECT movieId FROM anyChicagoStoreMovieIds
        EXCEPT
        SELECT movieId FROM moviesNotInAtLeast1ChicagoStore
    )
    
    movieId  storeId
    ----------------
    D00003   WI02
    D00003   WI03
    D00004   WI03
    D00005   WI03
    
  6. Which can then be INNER JOINed with movie to get their title information, etc:

     SELECT
         m.movieId,
         m.title
     FROM
         moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ffs
         INNER JOIN movie AS m ON
             ffs.movieId = m.movieId;
    
    movieId  title
    ----------------
    D00001   'True Lies'
    D00006   'Conan 2'
    D00007   'Red Sonja'
    

Thus giving the full final query:

WITH
chicagoStores AS (
    SELECT storeId FROM store WHERE city = 'Chicago'
),
anyChicagoStoreMovieIds AS (
    SELECT DISTINCT
        ms.movieId
    FROM
        movie_store AS ms
        INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId
),
expectedMovieStores AS (
  SELECT
    m.movieId,
    cs.storeId
  FROM
    chicagoStores AS cs
    CROSS JOIN anyChicagoStoreMovieIds AS m
),
moviesNotInAtLeast1ChicagoStore AS (
  SELECT
    e.*
  FROM
    expectedMovieStores AS e
    LEFT OUTER JOIN movie_store AS ms ON
      e.movieId = ms.movieId
      AND
      e.storeId = ms.storeId
  WHERE
    ms.storeId IS NULL
),
moviesNotIn_moviesNotInAtLeast1ChicagoStore AS (
    SELECT
        movieId
    FROM
        anyChicagoStoreMovieIds
         
    EXCEPT
         
    SELECT
        movieId
    FROM
        moviesNotInAtLeast1ChicagoStore
)
SELECT
    m.movieId,
    m.title
FROM
    moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ffs
    INNER JOIN movie AS m ON
        ffs.movieId = m.movieId;

The CTEs that are only used once can be inlined to shorten the query somewhat:

WITH
chicagoStores AS (
    SELECT storeId FROM store WHERE city = 'Chicago'
),
anyChicagoStoreMovieIds AS (
    SELECT DISTINCT
        ms.movieId
    FROM
        movie_store AS ms
        INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId
),
expectedMovieStores AS (
    SELECT
        m.movieId,
        cs.storeId
    FROM
        chicagoStores AS cs
        CROSS JOIN anyChicagoStoreMovieIds AS m
)
SELECT
    m.movieId,
    m.title
FROM
    (
        SELECT
            movieId
        FROM
            anyChicagoStoreMovieIds

        EXCEPT

        SELECT
            e.movieId
        FROM
            expectedMovieStores AS e
            LEFT OUTER JOIN movie_store AS ms ON
                e.movieId = ms.movieId
                AND
                e.storeId = ms.storeId
        WHERE
            ms.storeId IS NULL
    ) AS ffs
    INNER JOIN movie AS m ON
        ffs.movieId = m.movieId;

...which is stioll rather long and complex for what's described as a CS/SQL homework - this took me over 2 hours to figure out because it was driving me mad.

CodePudding user response:

Try this

select city,title,count(*)
from 
store
inner join movie_store
on store.storeid = movie_store.storeid
inner join movie
on movie_store.movieid = movie.movieid
where city = 'Chicago'
group by city,title
having count(*) = 3
  • Related