subset a table to include rows associated with only one occurence of a set of entries-CodePudding

I have the following table for 4 individuals with their favorite fruit:

tbl:

ID         FRUIT
personA    banana
personB    apple
personC    orange
personD    grapefruit
personA    avocado
personB    banana
personC    melon
personD    pear
personA    banana

I would like to extract all the entries for the IDs that are associated with only one of the following: banana, apple, orange.

This means that I am only hoping to extract the rows for person A and person C. person B has both ("apple" AND "banana"). person D has none of the three desired fruits.

I currently have:

select *
from tbl t1
where exists (select 1
              from tbl t2
              where t1.ID=t2.ID and
                    (t2.fruit='banana' or
                     t2.fruit='apple' or
                     t2.fruit='orange'))
join (select ID, count(*) over (partition by t3.fruit)
      from (select distinct ID, fruit
              from tbl
              where fruit='banana' or
                    fruit='apple' or
                    fruit='orange')) t3
on t3.ID=t.ID and
   t3.cnt=1

Is there a better/simpler way to execute this? I would like to optimize it to reduce run time given the tables are quite large.

desired table:

ID         FRUIT
personA    banana
personC    orange
personA    avocado
personC    melon
personA    banana

CodePudding user response：

The subquery determines all IDs which have those three fruits and count how many individual fruits they have.

You know can join the ID to the table and only take those IDs which have only 1 fruit, if you also want three fruits you and add this in the join condition

SELECT 
    ID, FRUIT
FROM
    tbl t1
        JOIN
    (SELECT 
        ID, COUNT(DISTINCT FRUIT) countf
    FROM
        tbl
    WHERE
        FRUIT IN ('banana' , 'apple\'orange')
    GROUP BY 1) t2 ON t1.ID = t2.ID AND t2.countf = 1;

CodePudding user response：

We can use GROUP BY and HAVING COUNT(DISTINCT) to find the persons we want.

SELECT 
  a.ID,
  b.FRUITID
FROM tb1 a
JOIN tb1 b ON a.ID = b.ID
WHERE a.FRUITID IN ('banana', 'apple', 'orange')
GROUP BY 
  a.ID,
  b.FRUITID
HAVING COUNT(DISTINCT a.FRUITID) = 1;
GO

ID      | FRUITID
:------ | :------
personA | avocado
personA | banana 
personC | melon  
personC | orange

db<>fiddle here

CodePudding user response：

Alternative approach is to use COUNT_IF:

SELECT *
FROM tbl t
QUALIFY COUNT_IF(t.fruit = 'banana') OVER(PARTITION BY ID) = 1
    AND COUNT_IF(t.fruit = 'apple')  OVER(PARTITION BY ID) = 1
    AND COUNT_IF(t.fruit = 'orange') OVER(PARTITION BY ID) = 1;

CodePudding user response：

This can be done with:

select *
from data
qualify count( distinct iff(fruit in ('apple','banana','orange'), fruit, null)) over(partition by id) = 1

which gives:

ID	FRUIT
personA	avocado
personA	banana
personA	banana
personC	melon
personC	orange

this can be shown how it works by showing the intermediate state of the IFF and the COUNT DISTINCT like so:

select *
    ,iff(fruit in ('apple','banana','orange'), fruit, null) as v
    ,count( distinct v) over(partition by id) as c
from data

gives:

ID	FRUIT	V	C
personA	avocado		1
personA	banana	banana	1
personA	banana	banana	1
personB	apple	apple	2
personB	banana	banana	2
personC	melon		1
personC	orange	orange	1
personD	grapefruit		0
personD	pear		0

Thus personD is eliminated for have no magic fruit, and personB is eliminated for have too much.