I have two tables:
- pages_interests
- page_id INT NOT NULL
- interest_id INT NOT NULL
- items_interests
- item_id INT NOT NULL
- interest_id INT NOT NULL
pages_interest:
page_id | interest_id |
---|---|
1 | 1 |
1 | 7 |
2 | 1 |
3 | 1 |
3 | 7 |
3 | 89 |
items_interest:
item_id | interest_id |
---|---|
10 | 1 |
10 | 7 |
10 | 20 |
12 | 1 |
12 | 55 |
I'm trying to figure out how to get SQL to join on multiple rows. Because the page expected items to have an interest of 1 and 7, get items that have both those interests, but don't discard when an item has other interests too. The expected output would be:
page_id | item_id |
---|---|
1 | 10 |
2 | 10 |
2 | 12 |
Does anyone have any idea how I could achieve this?
CodePudding user response:
Using an INNER JOIN
between the two tables, will make appear only interest_id in common between the two tables. To gather only <page_id, item_id> that have both the two pages, it's sufficient to enforcing COUNT(DISTINCT i.interest_id) = 2
inside the HAVING
clause.
SELECT p.page_id, i.item_id
FROM pages_interest p
INNER JOIN items_interest i
ON p.interest_id = i.interest_id
GROUP BY p.page_id, i.item_id
HAVING COUNT(DISTINCT i.interest_id) = 2
If you want to generalize on the number of "pages_interest" items, you can do:
SELECT p.page_id, i.item_id
FROM pages_interest p
INNER JOIN items_interest i
ON p.interest_id = i.interest_id
GROUP BY p.page_id, i.item_id
HAVING COUNT(DISTINCT i.interest_id) = (SELECT COUNT(DISTINCT interest_id) FROM pages_interest)
Output:
page_id | item_id |
---|---|
1 | 10 |
Check the demo here.
CodePudding user response:
I think something like this might work. I added a couple of more pages for more realistic test:
;with pages as (
select *
from (
VALUES (1, 1)
, (1, 7)
, (2, 1)
, (3, 1)
, (3, 7)
, (3, 89)
, (4, 20)
, (5, 55)
, (5, 1)
, (6, 1)
, (6, 13)
) t (page_id,interest_id)
)
, items as (
select *
from (
VALUES (10, 1)
, (10, 7)
, (10, 20)
, (12, 1)
, (12, 55)
) t (item_id,interest_id)
)
select p.page_id, i.item_id
from (
select p.page_id, interest_id, COUNT(*) OVER(PARTITION BY page_id) AS total_interests
FROM pages p
) p
LEFT JOIN items i
ON i.interest_id = p.interest_id
group by p.page_id, i.item_id, p.total_interests
HAVING COUNT(i.item_id) >= p.total_interests
The idea is to keep track of total page interests and then make sure it's no less than item counts (if we miss, LEFT JOIN value becomes null and COUNT ignores is).