I am working in SQL Server 2017 and I have the following two tables:
create table Computer (
Id int Identity(1, 1) not null,
Name varchar(100) not null,
constraint pk_computer primary key (Id)
);
create table HardDisk (
Id int Identity(1, 1) not null,
Interface varchar(100) not null,
ComputerId int not null,
constraint pk_harddisk primary key (Id),
constraint fk_computer_harddisk foreign key (ComputerId) references Computer(Id)
);
I have data such as:
Query
My current query is the following:
-- select query
select c.Id as computer_id,
string_agg(cast(hd.Interface as nvarchar(max)), ' ') as hard_disk_interfaces
from Computer c
left join HardDisk hd on c.Id = hd.ComputerId
group by c.Id;
This gets me the following:
computer_id | hard_disk_interfaces
------------- ----------------------
1 | SATA SAS
2 | SATA SAS SAS SAS SATA
However, I only want the distinct values, I'd like to end up with:
computer_id | hard_disk_interfaces
------------- ----------------------
1 | SATA SAS
2 | SATA SAS
I tried to put distinct
in front of the string_agg
, but that didn't work.
Incorrect syntax near the keyword 'distinct'.
CodePudding user response:
string_agg is missing that feature , so you have to prepare the distinct list you want then aggregate them :
select id , string_agg(interface,' ') hard_disk_interfaces
from (
select distinct c.id, interface
from Computer c
left join HardDisk hd on c.Id = hd.ComputerId
) t group by id
for your original query :
select *
from ....
join (
<query above> ) as temp
...
group by ... , hard_disk_interfaces
CodePudding user response:
A couple of other ways:
;WITH cte AS
(
SELECT c.Id, Interface = CONVERT(varchar(max), hd.Interface)
FROM dbo.Computer AS c
LEFT OUTER JOIN dbo.HardDisk AS hd ON c.Id = hd.ComputerId
GROUP BY c.Id, hd.Interface
)
SELECT Id, STRING_AGG(Interface, ' ')
FROM cte
GROUP BY Id;
or
SELECT c.Id, STRING_AGG(x.Interface, ' ')
FROM dbo.Computer AS c
OUTER APPLY
(
SELECT Interface = CONVERT(varchar(max), Interface)
FROM dbo.HardDisk WHERE ComputerID = c.Id
GROUP BY Interface
) AS x
GROUP BY c.Id;
- Example db<>fiddle
If you are getting duplicates in a larger query with more joins, I would argue those duplicates are not duplicates coming out of STRING_AGG()
, but rather duplicate rows coming from one or more of your 47 joins, not from this portion of the query. And I would guess that you still get those duplicates even if you leave out this join altogether.