SQL: rows that share several values on a specific column-CodePudding

I have a table Visited with 2 columns:

ID | City

ID is an integer, City is a string.

Note that none of the columns is a key by itself - we can have the same ID visiting several cities, and several different IDs in the same city.

Given a specific ID, I want to return all the IDs in the table that visited at least half of the places that the input ID did (not including themselves)

edit: We only count places that are the same. so if ID 1 visited cities a,b,c. ID 2 visited b,c,d. ID 3 visited c,d,e.

then for ID=1 we return only [2], because out of the three cities ID1 visited, ID3 visited only one

CodePudding user response：

Inner join the visited table with the list of cities visited by the specific id, then select ids with at least half of the number of rows when grouped by id.

with u as
    (select city as visitedBySpecificId from visited where id = *specificId*),
v as
    (select * from visited inner join u on city = visitedBySpecificId where id <> *specificId*)
(select id from v group by id having count(*) >= (select count(*) from u)/2.0)

Fiddle

CodePudding user response：

Join them and compare the counts.

create table suspect_tracking (id int, city varchar(30))

insert into suspect_tracking values 
  (1, 'Brussels'), (1,'London'), (1,'Paris')
, (1,'New York'), (1,'Bangkok'), (1, 'Hong Kong')
, (1,'Dubai'), (1,'Singapoor'), (1,'Rome')
, (1,'Macau'), (1, 'Istanbul'), (1,'Kuala Lumpur')
, (1,'Dehli'), (1,'Tokyo'), (1,'Moscow')

, (2,'New York'), (2,'Bangkok'), (2, 'Hong Kong')
, (2,'Dubai'), (2,'Singapoor'), (2,'Rome')
, (2,'Macau'), (2, 'Istanbul'), (2,'Kuala Lumpur')

, (3,'Macau'), (3, 'Istanbul'), (3,'Kuala Lumpur')
, (3,'Dehli'), (3,'Tokyo'), (3,'Moscow')

with cte_suspects as (
  select id, city
  from suspect_tracking
  group by id, city
)
, cte_prime_suspect as (
  select distinct id, city
  from suspect_tracking
  where id = 1
)
, cte_prime_total as (
  select id, count(city) as cities
  from cte_prime_suspect
  group by id
)
select sus.id
from cte_prime_suspect prime
join cte_prime_total primetot 
  on primetot.id = prime.id
join cte_suspects sus 
  on sus.city = prime.city and sus.id <> prime.id
group by prime.id, sus.id, primetot.cities
having count(sus.city) >= primetot.cities/2

| id |
| -: |
|  2 |

db<>fiddle here