Home > other >  Removing duplicates from a comma-separated string and assigning them to a row in SQL
Removing duplicates from a comma-separated string and assigning them to a row in SQL

Time:12-17

I have a table consisting of three columns.

One of the columns consists of a comma-separated list of building numbers:

Id PropertyNumber BuildingNumbers
4315 11594 1,5,6,10,10,11
3434 44379 1,1,2,2,3,3,4,4

I'm trying to write an UPDATE SQL statement, which will remove the duplicates from the BuildingNumbers table, so that my table will look like this:

Id PropertyNumber BuildingNumbers
4315 11594 1,5,6,10,11
3434 44379 1,2,3,4

How can I achieve this?

I'm using a SQL Server 2019 and Microsoft SQL Server Management Studio 18.

CodePudding user response:

It seems like this is a quite complex task due to this bad idea to save data in that way. In the future, you should get rid of this structure.

I guess something like this should work correctly, but maybe you need to do some modifications:

WITH split AS
(SELECT id, PropertyNumber,
CAST(split.Value AS INT) AS BuildingNumbers
FROM yourtable
CROSS APPLY STRING_SPLIT(yourtable.BuildingNumbers,',') split
GROUP BY id, PropertyNumber, CAST(split.Value AS INT)),
updateData AS
(SELECT id, PropertyNumber, STRING_AGG(BuildingNumbers,',') AS BuildingNumbers
FROM split
GROUP BY id, PropertyNumber)
UPDATE y SET y.BuildingNumbers = ud.BuildingNumbers
FROM yourtable y
INNER JOIN updateData ud
ON y.id = ud.id
AND y.PropertyNumber = ud.PropertyNumber;

I created a fiddle based on your sample data which shows this should be ok: db<>fiddle

As you can replicate in the fiddle, the idea is following:

  1. Split the comma-separated list of values in the distinct different values per id.
  2. Build again a comma-separated list of those distinct values per id.
  3. Update the data, replace the previous comma-separated list with the new one.

A last note: This answer focuses on the way how to update your data, not on how to improve your data structure in the future, because this was not your question.

However, I recommend, as already said in the beginning of this answer, to review your structure and save the data in a better form. One option is shown in this answer.

CodePudding user response:

My recommendation is related to your design. I think you need to rethink your table structure. I will elaborate on Panagiotis Kanavos's reply. You should have a "Buildings" table and a "Properties" table. There will be a column in "Buildings" table for "PropertyID" with which you will specify the relationship for each individual Building. Your tables should look something like this;

PROPERTIES TABLE;

PropertyID PropertyName
4315 11594
3434 44379

BUILDINGS TABLE;

BuildingNumber PropertyID
1 4315
2 3434
1 3434
4 4315
2 4315
6 3434

This way, the connection between buildings and properties are managed in a proper way. You can set up FK relation between BuildingTable's PropertyID column and PropertyTable's PropertyID column, but you don't exactly have to. It just would be good practice and would reduce chance for errors down the road.

CodePudding user response:

Create a function to split a string by comma to table format the result:

CREATE FUNCTION [dbo].[SplitCode]
(
  @CodeList varchar(MAX)
)
RETURNS
@ParsedList table
(
Code int
)
AS
BEGIN

DECLARE @Code varchar(15), @Pos int

SET @CodeList = LTRIM(RTRIM(@CodeList))  ','
SET @Pos = CHARINDEX(',', @CodeList, 1)

IF REPLACE(@CodeList, ',', '') <> ''
BEGIN
  WHILE @Pos > 0
  BEGIN
    SET @Code = LTRIM(RTRIM(LEFT(@CodeList, @Pos - 1)))
    IF @Code <> ''
    BEGIN
      INSERT INTO @ParsedList (Code)
      VALUES (CAST(@Code AS int))
    END
    SET @CodeList = RIGHT(@CodeList, LEN(@CodeList) - @Pos)
    SET @Pos = CHARINDEX(',', @CodeList, 1)
  END
END
RETURN
END

Once you execute the above function, it will give the result as a table.

select distinct Code from [dbo].[SplitCode]('10,200,200,300')

In your update query, you need to use the above select query to perform your task.

  • Related