What is the proper method to filter for a value where you want one of the rows for that grouped by column to have that value at least once, but not limit it to that value?
Customer_ID | Item
ab123 Cookie
ab123 Juice
ab123 Towel
bd345 Juice
bd345 Coffee
VG555 Cookie
VG555 Cookie
VG555 Water
Where I want all customer ids that have at least one item that is a cookie?
CodePudding user response:
An EXISTS is sometimes useful for this.
SELECT Customer_ID
FROM customers cust
WHERE EXISTS (
SELECT 1
FROM purchases purch
WHERE purch.Customer_ID = cust.Customer_ID
AND purch.Item = 'Cookie'
);
But since it's just about customers with cookies.
A DISTINCT
or GROUP BY
could be enough.
SELECT Customer_ID, COUNT(Item) AS Cookies
FROM purchases
WHERE Item = 'Cookie'
GROUP BY Customer_ID
CodePudding user response:
You want all customer IDs that have at least one item that is a cookie. For this you'd simply select the customer IDs from the rows where the item equals 'Cookie'. In order to get rid of duplicates, you'd use DISTINCT
:
select distinct customer_id from mytable where item = 'Cookie'.
With a big table and many duplicates, this can take long though. Image 1000 customers with 1000 cookies each. This would be a million rows to sort and aggregate in order to get the 1000 result rows. In that case a mere lookup might be more appropriate:
select customer_id
from customer
where customer_id in (select customer_id from mytable where item = 'Cookie');
Thus the DBMS can stop reading a customer's rows once it has found the first cookie for them.
Update: snowflake doesn't support indexes. Try both queries hence and see which one gets executed faster.
create index idx on mytable (item, customer_id);
CodePudding user response:
Using QUALIFY
and COUNT_IF
:
SELECT *
FROM Tab
QUALIFY COUNT_IF(Item='Cookie') OVER(PARTITION BY Customer_ID) > 0
Alternatively using HAVING
:
SELECT Customer_id
FROM tab
GROUP BY Customer_Id
HAVING COUNT_IF(Item='Cookie') > 0;