Home > Software engineering >  SQL select all purchases that contain at least two products from a alist
SQL select all purchases that contain at least two products from a alist

Time:09-06

I would like to write a SQL query to retrieve all transactions (DOCUMENT_NUM) that contain at least two products from a selection list ("A", "B", "C")

select TRANSACTION_DAY, DOCUMENT_NUM, PRODUCT_CODE, PRODUCT_DSCR, QTY_SOLD, NET_SALES
from main.sales
where PRODUCT_CODE in ("A", "B", "C")

If I use the IN operator, the query may return transactions with one product from the list, but I want at least two.

Is it possible to achieve that?

CodePudding user response:

Yes, it is possible to achieve that using group by and count

select distinct DOCUMENT_NUM
from main.sales
where PRODUCT_CODE in ("A", "B", "C")
group by DOCUMENT_NUM
having count(distinct PRODUCT_CODE)>=2

If you need all the columns, you can do a nested query

select TRANSACTION_DAY, DOCUMENT_NUM, PRODUCT_CODE, PRODUCT_DSCR, QTY_SOLD, NET_SALES
from main.sales
where DOCUMENT_NUM in (
  select distinct DOCUMENT_NUM
  from main.sales
  where PRODUCT_CODE in ("A", "B", "C")
  group by DOCUMENT_NUM
  having count(distinct PRODUCT_CODE)>=2
)
and PRODUCT_CODE in ("A", "B", "C")

CodePudding user response:

Alternatively, consider approx_count_distinct (faster than COUNT DISTINCT for larger datasets) to filter DOCUMENT_NUM by distinct PRODUCT_CODE:

select TRANSACTION_DAY, DOCUMENT_NUM, 
       PRODUCT_CODE, PRODUCT_DSCR, 
       QTY_SOLD, NET_SALES 
from main.sales 
where DOCUMENT_NUM IN (
   select DOCUMENT_NUM
   from main.sales
   group by DOCUMENT_NUM
   having approx_count_distinct(PRODUCT_CODE) 
          filter(where PRODUCT_CODE in ("A", "B", "C")) >= 2
)

CodePudding user response:

If you want to include transactions that have at least two of the same products as well, see code below.

CREATE TABLE test (
id INT
, cat CHAR)

INSERT INTO test(id, cat)
VALUES (1, 'a')
, (2, 'a')
, (3, 'a')
, (1, 'b')
, (2, 'a')
, (3, 'c')

SELECT id
, COUNT(id)
FROM test
WHERE cat IN ('a', 'b')
GROUP BY id
HAVING COUNT(id) > 1

If you only want transactions that have 2 or more unique products use below code to replace last block of code above:

SELECT id
, COUNT(id)
FROM (SELECT DISTINCT * FROM test) t
WHERE cat in ('a', 'b')
GROUP BY id
HAVING COUNT(id) > 1
  • Related