I'm cleaning up some data and have several rows that contain repetitive words within a string. For instance, "concrete-concrete" or "art-art". I could just do use a case when with a like to find these, however there are so many of these repetitions that it will take too long to find all of them. Is there a SQL function for finding duplicative patterns like this?
Here's some sample data:
category
- concrete-concrete
- art-art
- concrete-art
- music-classical
- music-music-classical
CodePudding user response:
Please check the script below that is developed in SQL Server 2016 that might help you on your task.
CREATE TABLE #temp(Id int,[StringValue] varchar(100))
INSERT INTO #temp VALUES
(1,'word1-art-word2-art'),
(2,'concrete-concrete-word1-word2'),
(3,'word1-word2-art-concrete'),
(4,'art-art-concrete-concrete')
SELECT Id --Gives the ID of string where repetitive word(s) exist
FROM #temp d
WHERE EXISTS(
SELECT value
FROM STRING_SPLIT(d.[StringValue],'-')
WHERE value<>''
GROUP BY value
HAVING COUNT(*)>1
)
SELECT Id, --shows the repetitive word(s) for each id.
(
SELECT state_code
FROM
(
SELECT STUFF((SELECT CAST(', ' value AS VARCHAR(MAX))
FROM STRING_SPLIT(d.[StringValue],'-')
WHERE value<>N''
GROUP BY value HAVING COUNT(*)>1
FOR XML PATH ('')), 1, 2, '') AS state_code
) q
) repeatingwords
FROM #temp d
WHERE EXISTS(SELECT value FROM STRING_SPLIT(d.[StringValue],'-') WHERE value<>N'' GROUP BY value HAVING COUNT(*)>1)
CodePudding user response:
I would find the using the having function and delete them.
SELECT category, COUNT(*)
FROM users
GROUP BY category
HAVING COUNT(*) > 1