Sorry if the question was not phrased awkwardly, I didn't know how to put it without an example.
Given I have a 2 tables, Watch and WorkIn:
Watch: {emailaddress, videoID)
WorkIn: {videoID, castname}
How do I find for example, the email address of people who have watched ALL of 'David' products?
I was thinking of using GROUP BY and ALL but I'm rather new to sql querying and don't really know how to put it all together.
CodePudding user response:
Two tables - one with e-mails and videos watched by e-mail owner and the other with videos and cast names worked in them. As I understand the e-mail owners could watch and cast names could participate in multiple videos with cross over possibilities. Something like in this two tables (as sample data):
WITH
t_watch AS
(
Select '[email protected]' "E_MAIL", 101 "VIDEO_ID" From Dual Union All
Select '[email protected]' "E_MAIL", 101 "VIDEO_ID" From Dual Union All
Select '[email protected]' "E_MAIL", 102 "VIDEO_ID" From Dual Union All
Select '[email protected]' "E_MAIL", 102 "VIDEO_ID" From Dual Union All
Select '[email protected]' "E_MAIL", 201 "VIDEO_ID" From Dual Union All
Select '[email protected]' "E_MAIL", 101 "VIDEO_ID" From Dual Union All
Select '[email protected]' "E_MAIL", 201 "VIDEO_ID" From Dual Union All
Select '[email protected]' "E_MAIL", 301 "VIDEO_ID" From Dual
),
t_work_in AS
(
Select 101 "VIDEO_ID", 'Chriss' "CAST_NAME" From Dual Union All
Select 101 "VIDEO_ID", 'David' "CAST_NAME" From Dual Union All
Select 101 "VIDEO_ID", 'Annie' "CAST_NAME" From Dual Union All
Select 102 "VIDEO_ID", 'Chriss' "CAST_NAME" From Dual Union All
Select 201 "VIDEO_ID", 'Chriss' "CAST_NAME" From Dual Union All
Select 201 "VIDEO_ID", 'David' "CAST_NAME" From Dual Union All
Select 201 "VIDEO_ID", 'Annie' "CAST_NAME" From Dual Union All
Select 202 "VIDEO_ID", 'Annie' "CAST_NAME" From Dual Union All
Select 301 "VIDEO_ID", 'Robert' "CAST_NAME" From Dual
),
The question is about getting the e-mail address of a person watched all the videos participated by specific cast name. To get the answer you should get the lists of:
- unique ordered video ids watched by particular e-mail owner
- unique ordered video ids that particular cast name worked in
To do that create two ctes - cte_watched_by and cte_worked_in:
cte_worked_in AS
(
SELECT DISTINCT
wk.CAST_NAME,
LISTAGG(wk.VIDEO_ID, ', ') WITHIN GROUP (Order By wk.VIDEO_ID) OVER(Partition By wk.CAST_NAME) "WORKED_LIST",
Count(DISTINCT wk.VIDEO_ID) OVER(Partition By wk.CAST_NAME) "COUNT_IDS_WORKED"
FROM
(Select DISTINCT wrk.CAST_NAME, wrk.VIDEO_ID From t_work_in wrk Left Join t_watch wtc ON(wtc.VIDEO_ID = wrk.VIDEO_ID) Order By wrk.VIDEO_ID) wk
),
cte_watched_by AS
(
SELECT DISTINCT
wb.E_MAIL,
LISTAGG(wb.VIDEO_ID, ', ') WITHIN GROUP (Order By wb.VIDEO_ID) OVER(Partition By wb.E_MAIL) "WATCHED_LIST",
Count(DISTINCT wb.VIDEO_ID) OVER(Partition By wb.E_MAIL) "COUNT_IDS_WATCHED"
FROM
(Select DISTINCT wtc.E_MAIL, wtc.VIDEO_ID From t_watch wtc Left Join t_work_in wrk ON(wrk.VIDEO_ID = wtc.VIDEO_ID) Order By wtc.VIDEO_ID) wb
)
This is what we have got so far:
cte_worked_in
CAST_NAME | WORKED_LIST | COUNT_IDS_WORKED |
---|---|---|
Chriss | 101, 102, 201 | 3 |
Robert | 301 | 1 |
Annie | 101, 201, 202 | 3 |
David w101, 201 | 2 |
cte_watched_by
E_MAIL | WATCHED_LIST | COUNT_IDS_WATCHED |
---|---|---|
[email protected] | 101, 201 | 2 |
[email protected] | 101, 201 | 2 |
[email protected] | 102 | 1 |
[email protected] | 101, 102, 301 | 3 |
Now we can join those ctes and get the answer - below is main SQL:
SELECT DISTINCT
wk.CAST_NAME "WORKED_IN",
wk.COUNT_IDS_WORKED "COUNT_IDS_WORKED",
wk.WORKED_LIST,
wb.E_MAIL "WATCHED_BY",
wb.COUNT_IDS_WATCHED "COUNT_IDS_WATCHED",
wb.WATCHED_LIST
FROM
cte_worked_in wk
INNER JOIN
cte_watched_by wb ON
(
(wk.COUNT_IDS_WORKED = wb.COUNT_IDS_WATCHED And wk.WORKED_LIST = wb.WATCHED_LIST)
OR
(wk.COUNT_IDS_WORKED = 1 And wb.COUNT_IDS_WATCHED > 1 And InStr(wb.WATCHED_LIST, wk.WORKED_LIST) > 0)
OR
(wk.COUNT_IDS_WORKED = 2 And wb.COUNT_IDS_WATCHED > 2 And InStr(wb.WATCHED_LIST, SubStr(wk.WORKED_LIST, 1, 3)) > 0 And InStr(wb.WATCHED_LIST, SubStr(wk.WORKED_LIST, 6, 3)) > 0)
)
... and here is resulting dataset:
WORKED_IN | COUNT_IDS_WORKED | WORKED_LIST | WATCHED_BY | COUNT_IDS_WATCHED | WATCHED_LIST |
---|---|---|---|---|---|
Robert | 1 | 301 | [email protected] | 3 | 101, 102, 301 |
David | 2 | 101, 201 | [email protected] | 2 | 101, 201 |
David | 2 | 101, 201 | [email protected] | 2 | 101, 201 |
The ORs within ON expression of INNER JOIN serve to cover situations where somebody watch a lot of movies with cast names that works in just few movies. It is the weakest part of this answer.
The resulting dataset could be filtered (if needed) to show just results of a particular cast name and/or e-mail.
Regards...
CodePudding user response:
Wasn't able to test this but I think this should work:
SELECT emailaddress, count(*)
FROM Watch w inner join WorkIn wi on w.videoID = wi.videoID
WHERE wi.castName = 'Chris Evans'
GROUP BY emailaddress
HAVING count(*) = (select count(*) from WorkIn where wi.castName = 'Chris Evans')
The group by will count for each emailaddress the number of videos seen with Chris Evans.
HAVING is like a WHERE clause on the aggregated result (so after the GROUP BY). Only when this count is equal to the total number of videos in which Chris Evans plays, the emailaddress is returned.