Home > Enterprise >  Return rows that do not contain a subset of the same rows
Return rows that do not contain a subset of the same rows

Time:12-08

Let's say I have a table called activitiesthat has 2 columns: nameActivityand zipCode.

How do I write a query that returns all activities a for which there are no other activities b that:

  1. have the same zipCode
  2. that share the same nameActivity as another activity c. With other words, for this condition, we need to check that de nameActivity of activity b is unique.

Example:

nameActivity          zipCode
  kayaking              1000
  bouldering            1000
  museum                2000
  paintball             2000
  museum                3000
  karting               4000

Following all the conditions above this should result in:

nameActivity          zipCode
  paintball             2000
  museum                3000
  karting               4000

CodePudding user response:

For the first part of the condition :

SELECT (array_agg(nameActivity))[1]
     , zipCode
  FROM activities
 GROUP BY zipCode
 HAVING count(*) = 1

result :

array_agg   zipcode
karting     4000
museum      3000

(paintball, 2000) is excluded from the result because another nameActivity (museum) exists for the same zipCode.

The second part of the condition is unclear : what means "b share the same nameActivity as another activity c" ???

CodePudding user response:

well, this was a hard one. I think I got it right with some tinkering, hope it helps.

with data (nameActivity,          zipCode)as (
Select 'kayaking',              1000 from dual union all
Select  'bouldering',            1000 from dual union all
Select 'museum',               2000 from dual union all
Select 'paintball',             2000 from dual union all
Select 'museum',                3000 from dual union all
Select 'karting',               4000 from dual )
,duplicatesname as(
select nameActivity,zipCode
from(
select d1.nameActivity,max(zipCode) zipCode, count(*) cnt 
from data d1
group by d1.nameActivity)
where cnt >1)
,withoutduplicatename as(
select nameActivity,zipCode
from (
select d1.nameActivity,d1.zipCode,d2.nameActivity na2,d2.zipCode zi2
from data d1
left join duplicatesname d2 on d1.nameActivity = d2.nameActivity
and d1.zipCode != d2.zipCode
) where na2 is null)

,duplicatZipCode as(
select zipCode
from(
select zipCode , count(*) cnt 
from withoutduplicatename d1
group by d1.zipCode)
where cnt >1)

select nameActivity,zipCode
from withoutduplicatename d1
 where d1.zipCode not in (select zipCode from duplicatZipCode)
 order by zipCode

CodePudding user response:

I got near.
But kayaking can't be ignored.

create table activities (
nameActivity varchar(30),
zipCode int
)
insert into activities 
   (nameActivity,        zipCode) values
   ('kayaking',             1000) 
 , ('bouldering',           1000) 
 , ('museum',               2000) 
 , ('paintball',            2000) 
 , ('museum',               3000) 
 , ('karting',              4000) 
SELECT nameActivity, zipCode
FROM
(
    SELECT nameActivity, zipCode
    , ROW_NUMBER() OVER (PARTITION BY zipCode) rn
    FROM activities a
    WHERE NOT EXISTS (
       select 1
       from activities a2
       group by a2.zipCode
       having count(*) = 1
          and a2.zipCode != a.zipCode
          and max(a2.nameActivity) = a.nameActivity
    )
) q
WHERE rn = 1
nameactivity | zipcode
:----------- | ------:
kayaking     |    1000
paintball    |    2000
museum       |    3000
karting      |    4000

db<>fiddle here

  • Related