Home > OS >  SQL ALWAYS IN and/or NEVER IN filter
SQL ALWAYS IN and/or NEVER IN filter

Time:05-21

I have synthetic data such as

A, B
-----
1, 1
1, 2
2, 1
2, 1
3, 2
3, 2 

I have two questions I need to answer, given the below pseudo-code.

Question 1

SELECT A
WHERE B NEVER_IN('2')
FROM table

>>>
A
-
2

Question 2

SELECT A
WHERE B ALWAYS_IN('2')
FROM table

>>>
A
-
3

In both cases, A=1 was committed because A was sometimes but neither always or never equal to 2.

Is there a straightforward way to do this? Also, I use ALWAYS_IN and NEVER_IN because in practice, I would need to know if A was always equal to an element in an array or never equal to an element in an array.

What's the best way to accomplish this in SQL (using Presto)?

This is my best attempt at question w the ALWAYS_IN case, which is terribly slow:

WITH results AS (
    SELECT A, B
    FROM TABLE),   

possibly_good AS (
    SELECT A
    FROM results
    WHERE B IN (2)
),

bad AS (SELECT R.A 
        FROM results R 
        WHERE R.A NOT IN (
            SELECT P.A 
            FROM possibly_good P 
        )),

good AS (
    SELECT P.A
    FROM possibly_good P
    WHERE P.A NOT IN(
        SELECT B.A
        FROM bad B
    ))

SELECT * FROM good

CodePudding user response:

You can use group by and array_agg and then check the array contents (depended on Presto version either with all_match or work around with not contains or cardinality filter):

--sample data
WITH dataset (A, B) AS (
    VALUES  (1, 1),
        (1, 2),
        (2, 1),
        (2, 1),
        (3, 2),
        (3, 2)
)

-- query
select a
from
(
    select a
       , array_agg(b) agg -- concatenate into array
    from dataset
    group by a
)
where 
    not contains(agg, 2) -- NEVER_IN('2')
    -- For ALWAYS_IN('2') use all_match if available otherwise:  
    -- cardinality(agg) = cardinality(filter(agg, v -> v = 2)) 

Output:

a
2
  • Related