I am trying to do a market-basket analysis for sales data. For this, I need two values, order_id
and product_id
.
I only want the order_id
and the product_id
when there are duplicate values of each not necessarily in the same row, but I don't want to get rid of those values, I want to get rid of the order_id and product_id if they only show in the database once. It should be ordered by order_id
. The result should look something like this
from this
order_id | product_id |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
1 | 4 |
2 | 1 |
2 | 2 |
2 | 3 |
3 | 1 |
3 | 2 |
3 | 5 |
4 | 2 |
to this
order_id | product_id |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 2 |
2 | 3 |
3 | 1 |
3 | 2 |
SELECT order_id, product_id
FROM order_items
ORDER BY order_id
CodePudding user response:
I am sure there is a far more elegant solution, but if I understand your question to want rows that have either an order_id or product_id that shows up more than once in the DB, then I believe this will work:
EDIT: If you need them both to have multiple items in the list then I would just individually join them to a subquery as such:
SELECT t1.order_id, t1.product_id
FROM order_items t1
INNER JOIN (
SELECT ORDER_ID from order_items group by ORDER_ID having count(*) > 1) t2
ON t1.order_id = t2.order_id
INNER JOIN (
SELECT PRODUCT_ID from order_items group by PRODUCT_ID having count(*) > 1) t3
ON t1.product_id = t3.product_id
ORDER BY order_id
CodePudding user response:
A couple of windowed aggregates would probably work best™ here:
WITH CTE AS(
SELECT order_id,
product_id,
COUNT(1) OVER (PARTITION BY order_id) AS orders,
COUNT(1) OVER (PARTITION BY product_id) AS products
FROM dbo.YourTable)
SELECT order_id,
product_id
FROM CTE
WHERE orders > 1
AND products > 1;
CodePudding user response:
in
keyword can make the query simpler, but I'm not sure about the performance...
select order_id, product_id from order_items
where order_id not in (
select order_id from order_items group by order_id having count(*) = 1
) and product_id not in (
select product_id from order_items group by product_id having count(*) = 1
);