Home > Software design >  How to query only values that show up in the database more than once
How to query only values that show up in the database more than once

Time:12-31

I am trying to do a market-basket analysis for sales data. For this, I need two values, order_id and product_id.

I only want the order_id and the product_id when there are duplicate values of each not necessarily in the same row, but I don't want to get rid of those values, I want to get rid of the order_id and product_id if they only show in the database once. It should be ordered by order_id. The result should look something like this

from this

order_id product_id
1 1
1 2
1 3
1 4
2 1
2 2
2 3
3 1
3 2
3 5
4 2

to this

order_id product_id
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
SELECT order_id, product_id
FROM order_items
ORDER BY order_id

CodePudding user response:

I am sure there is a far more elegant solution, but if I understand your question to want rows that have either an order_id or product_id that shows up more than once in the DB, then I believe this will work:

EDIT: If you need them both to have multiple items in the list then I would just individually join them to a subquery as such:

SELECT t1.order_id, t1.product_id
FROM order_items  t1
INNER JOIN (
        SELECT ORDER_ID from order_items  group by ORDER_ID having count(*) > 1) t2 
    ON t1.order_id = t2.order_id
INNER JOIN (
        SELECT PRODUCT_ID from order_items  group by PRODUCT_ID having count(*) > 1) t3 
    ON t1.product_id = t3.product_id
ORDER BY order_id

CodePudding user response:

A couple of windowed aggregates would probably work best™ here:

WITH CTE AS(
    SELECT order_id,
           product_id,
           COUNT(1) OVER (PARTITION BY order_id) AS orders,
           COUNT(1) OVER (PARTITION BY product_id) AS products
    FROM dbo.YourTable)
SELECT order_id,
       product_id
FROM CTE
WHERE orders > 1
  AND products > 1;

CodePudding user response:

in keyword can make the query simpler, but I'm not sure about the performance...

select order_id, product_id from order_items
where order_id not in (
  select order_id from order_items group by order_id having count(*) = 1
) and product_id not in (
  select product_id from order_items group by product_id having count(*) = 1
);
  • Related