I have synthetic data such as
A, B
-----
1, 1
1, 2
2, 1
2, 1
3, 2
3, 2
I have two questions I need to answer, given the below pseudo-code.
Question 1
SELECT A
WHERE B NEVER_IN('2')
FROM table
>>>
A
-
2
Question 2
SELECT A
WHERE B ALWAYS_IN('2')
FROM table
>>>
A
-
3
In both cases, A=1 was committed because A was sometimes but neither always or never equal to 2.
Is there a straightforward way to do this? Also, I use ALWAYS_IN and NEVER_IN because in practice, I would need to know if A was always equal to an element in an array or never equal to an element in an array.
What's the best way to accomplish this in SQL (using Presto)?
This is my best attempt at question w the ALWAYS_IN case, which is terribly slow:
WITH results AS (
SELECT A, B
FROM TABLE),
possibly_good AS (
SELECT A
FROM results
WHERE B IN (2)
),
bad AS (SELECT R.A
FROM results R
WHERE R.A NOT IN (
SELECT P.A
FROM possibly_good P
)),
good AS (
SELECT P.A
FROM possibly_good P
WHERE P.A NOT IN(
SELECT B.A
FROM bad B
))
SELECT * FROM good
CodePudding user response:
You can use group by
and array_agg
and then check the array contents (depended on Presto version either with all_match
or work around with not contains
or cardinality
filter
):
--sample data
WITH dataset (A, B) AS (
VALUES (1, 1),
(1, 2),
(2, 1),
(2, 1),
(3, 2),
(3, 2)
)
-- query
select a
from
(
select a
, array_agg(b) agg -- concatenate into array
from dataset
group by a
)
where
not contains(agg, 2) -- NEVER_IN('2')
-- For ALWAYS_IN('2') use all_match if available otherwise:
-- cardinality(agg) = cardinality(filter(agg, v -> v = 2))
Output:
a |
---|
2 |