I have the following table for 4 individuals with their favorite fruit:
tbl
:
ID FRUIT
personA banana
personB apple
personC orange
personD grapefruit
personA avocado
personB banana
personC melon
personD pear
personA banana
I would like to extract all the entries for the IDs that are associated with only one of the following: banana, apple, orange.
This means that I am only hoping to extract the rows for person A and person C. person B has both ("apple" AND "banana"). person D has none of the three desired fruits.
I currently have:
select *
from tbl t1
where exists (select 1
from tbl t2
where t1.ID=t2.ID and
(t2.fruit='banana' or
t2.fruit='apple' or
t2.fruit='orange'))
join (select ID, count(*) over (partition by t3.fruit)
from (select distinct ID, fruit
from tbl
where fruit='banana' or
fruit='apple' or
fruit='orange')) t3
on t3.ID=t.ID and
t3.cnt=1
Is there a better/simpler way to execute this? I would like to optimize it to reduce run time given the tables are quite large.
desired table
:
ID FRUIT
personA banana
personC orange
personA avocado
personC melon
personA banana
CodePudding user response:
The subquery determines all IDs which have those three fruits and count how many individual fruits they have.
You know can join the ID to the table and only take those IDs which have only 1 fruit, if you also want three fruits you and add this in the join condition
SELECT
ID, FRUIT
FROM
tbl t1
JOIN
(SELECT
ID, COUNT(DISTINCT FRUIT) countf
FROM
tbl
WHERE
FRUIT IN ('banana' , 'apple\'orange')
GROUP BY 1) t2 ON t1.ID = t2.ID AND t2.countf = 1;
CodePudding user response:
We can use GROUP BY and HAVING COUNT(DISTINCT) to find the persons we want.
SELECT a.ID, b.FRUITID FROM tb1 a JOIN tb1 b ON a.ID = b.ID WHERE a.FRUITID IN ('banana', 'apple', 'orange') GROUP BY a.ID, b.FRUITID HAVING COUNT(DISTINCT a.FRUITID) = 1; GO
ID | FRUITID :------ | :------ personA | avocado personA | banana personC | melon personC | orange
db<>fiddle here
CodePudding user response:
Alternative approach is to use COUNT_IF:
SELECT *
FROM tbl t
QUALIFY COUNT_IF(t.fruit = 'banana') OVER(PARTITION BY ID) = 1
AND COUNT_IF(t.fruit = 'apple') OVER(PARTITION BY ID) = 1
AND COUNT_IF(t.fruit = 'orange') OVER(PARTITION BY ID) = 1;
CodePudding user response:
This can be done with:
select *
from data
qualify count( distinct iff(fruit in ('apple','banana','orange'), fruit, null)) over(partition by id) = 1
which gives:
ID | FRUIT |
---|---|
personA | avocado |
personA | banana |
personA | banana |
personC | melon |
personC | orange |
this can be shown how it works by showing the intermediate state of the IFF and the COUNT DISTINCT like so:
select *
,iff(fruit in ('apple','banana','orange'), fruit, null) as v
,count( distinct v) over(partition by id) as c
from data
gives:
ID | FRUIT | V | C |
---|---|---|---|
personA | avocado | 1 | |
personA | banana | banana | 1 |
personA | banana | banana | 1 |
personB | apple | apple | 2 |
personB | banana | banana | 2 |
personC | melon | 1 | |
personC | orange | orange | 1 |
personD | grapefruit | 0 | |
personD | pear | 0 |
Thus personD
is eliminated for have no magic fruit, and personB
is eliminated for have too much.