Home > other >  Make string_agg() return unique values only
Make string_agg() return unique values only

Time:11-11

I am working in SQL Server 2017 and I have the following two tables:

create table Computer (
  Id int Identity(1, 1) not null,
  Name varchar(100) not null,
  constraint pk_computer primary key (Id)
);

create table HardDisk (
  Id int Identity(1, 1) not null,
  Interface varchar(100) not null,
  ComputerId int not null,
  constraint pk_harddisk primary key (Id),
  constraint fk_computer_harddisk foreign key (ComputerId) references Computer(Id)
);

I have data such as:

screenshot of data in the tables

Query

My current query is the following:

-- select query
select c.Id as computer_id,
       string_agg(cast(hd.Interface as nvarchar(max)), ' ') as hard_disk_interfaces
from Computer c
left join HardDisk hd on c.Id = hd.ComputerId
group by c.Id;

This gets me the following:

 computer_id | hard_disk_interfaces
------------- ----------------------
 1           | SATA SAS
 2           | SATA SAS SAS SAS SATA

However, I only want the distinct values, I'd like to end up with:

 computer_id | hard_disk_interfaces
------------- ----------------------
 1           | SATA SAS
 2           | SATA SAS

I tried to put distinct in front of the string_agg, but that didn't work.

Here's a db-fiddle.

Incorrect syntax near the keyword 'distinct'.

CodePudding user response:

string_agg is missing that feature , so you have to prepare the distinct list you want then aggregate them :

select id , string_agg(interface,' ') hard_disk_interfaces
from (
   select distinct c.id, interface
   from Computer c
   left join HardDisk hd on c.Id = hd.ComputerId
) t group by id

for your original query :

select *
from ....
join (
 <query above> ) as temp 
...
group by ... , hard_disk_interfaces

CodePudding user response:

A couple of other ways:

;WITH cte AS 
(
  SELECT c.Id, Interface = CONVERT(varchar(max), hd.Interface)
    FROM dbo.Computer AS c
    LEFT OUTER JOIN dbo.HardDisk AS hd ON c.Id = hd.ComputerId
    GROUP BY c.Id, hd.Interface
)
SELECT Id, STRING_AGG(Interface, ' ')
FROM cte
GROUP BY Id;

or

SELECT c.Id, STRING_AGG(x.Interface, ' ')
FROM dbo.Computer AS c
OUTER APPLY
(
  SELECT Interface = CONVERT(varchar(max), Interface) 
    FROM dbo.HardDisk WHERE ComputerID = c.Id
    GROUP BY Interface
) AS x
GROUP BY c.Id;

If you are getting duplicates in a larger query with more joins, I would argue those duplicates are not duplicates coming out of STRING_AGG(), but rather duplicate rows coming from one or more of your 47 joins, not from this portion of the query. And I would guess that you still get those duplicates even if you leave out this join altogether.

  • Related