Home > Software engineering >  How to select DISTINCT records based on multiple columns and without considering their order
How to select DISTINCT records based on multiple columns and without considering their order

Time:04-02

I'm currently working on a SQL Server database and I would need a query that returns pairs of customers with the same city from a table that has this structure

Customer(ID, Name, Surname, City)

and this sample data

Name       | Surname   | City
----------- ----------- -----------
Foo        | Foo       | New York
----------- ----------- -----------
Bar        | Bar       | New York
----------- ----------- -----------
Alice      | A         | London
----------- ----------- -----------
Bob        | B         | London

I have tried defining a query that joins the Customer table itself

SELECT C1.Name   ' '   C1.Surname CustomerA, C2.Name   ' '   C2.Surname CustomerB, C1.City
FROM Customer C1 JOIN Customer C2 
     ON C1.City = C2.City
WHERE CustomerA <> CustomerB

but it gives me a table that looks like this

CustomerA  | CustomerB | City
----------- ----------- -----------
Foo Foo    | Bar Bar   | New York
----------- ----------- -----------
Bar Bar    | Foo Foo   | New York
----------- ----------- -----------
Alice A    | Bob B     | London
----------- ----------- -----------
Bob B      | Alice A   | London

with duplicated rows but with swapped customers.

My question is how would I have to do to select those rows once (e.g. for the first two results, it would be great only the first or the second row).

This would be an example of the expected result

CustomerA  | CustomerB | City
----------- ----------- -----------
Foo Foo    | Bar Bar   | New York
----------- ----------- -----------
Alice A    | Bob B     | London

CodePudding user response:

I think I understand what you are looking for but it seems over simplified to your actual problem. Your query you posted was incredibly close to working. You can't reference columns by their alias in the where predicates so you will need to use the string concatenation you had in your column. Then you can simply change the <> to either > or < so you only get one match. This example should work for your problem as I understand it.

declare @Customer table
(
    CustID int identity
    , Name varchar(10)
    , Surname varchar(10)
    , City varchar(10)
)

insert @Customer
select 'Foo', 'Foo', 'New York' union all
select 'Bar', 'Bar', 'New York' union all
select 'Smith', 'Smith', 'New York' union all
select 'Alice', 'A', 'London' union all
select 'Bob', 'B', 'London'

SELECT CustomerA = C1.Name   ' '   C1.Surname
    , CustomerB = C2.Name   ' '   C2.Surname
    , C1.City
FROM @Customer C1 
JOIN @Customer C2 ON C1.City = C2.City
where C1.Name   ' '   C1.Surname > C2.Name   ' '   C2.Surname

CodePudding user response:

You can use concat and group by clause for this query

select concat(C1.Name," ",C1.surname) as CustomerA, concat(C2.Name," ",C2.surname) CustomerB,C1.city 
from customer C1 
left join customer C2 
on C1.city=C2.city  and C1.name<>C2.name
group by C1.city;
  • Related