I am stuck on an issue in SQL Server. I have a VARCHAR
column called Name
in my table:
I am trying to get the column to only return valid characters when doing a select on it. For example, I am only accepting any letters [A-Z], numbers [0-9] or a question mark [?] but list can change so need to be flexible. The reason why I am only accepting certain characters is due to our supplier specification which I send data to. It will break their system if I send then an invalid character.
SELECT Name FROM @table
For the purpose of asking the question, I have included a small example below where I insert into a table variable. My question is aimed towards the select part as I am trying to work on data already inserted.
DECLARE @table AS TABLE
(
ID INT ,
Name VARCHAR(500) ,
Age INT
)
INSERT INTO @table
VALUES (1, 'Hello ## World! Test8.?##', 23),
(2, 'Need specific characters only Test8.? ]]', 22)
-- Only accept [A-Z][0-9][?]
SELECT Name FROM @table
Please note, the scenario above is a small example and the data is just dummy data I just added to make it easier to ask the question. The data already exist. I have no control over it. I only have access to it and need to tidy it up via doing a select.
Expected results with only returning valid characters:
For first row it will return "Hello World Test8?" and for second row it will return "Need specific chatacters only Test8?".
What I have tried so far is doing a replace on the select to get the result:
-- Only accept [A-Z][0-9][?]
SELECT REPLACE(REPLACE(REPLACE(REPLACE(Name, '#', ''), '!', ''), ']', ''), '.', '') FROM @table
However, this only works if I knew which characters are invalid. As mentioned earlier in question, I only know the opposite which are valid characters. A valid character is a letter [A-Z] or number [0-9] or a question mark. This means I have a massive list of invalid characters I need to add if I went towards a replace solution.
Any idea how I can achieve this within the select statement?
I am on SQL Server Version 2012.
CodePudding user response:
There is no built-in functionality for this, though this was implemented by people before:
https://raresql.com/2013/03/11/sql-server-function-to-parse-alphanumeric-characters-from-string/
Using this (all copyrights to the author) would be:
CREATE FUNCTION dbo.[UDF_Extract_Alphanumeric_From_String]
(
@String VARCHAR(MAX) -- Variable for string
)
RETURNS VARCHAR(MAX)
BEGIN
DECLARE @RETURN_STRING VARCHAR(MAX)
; WITH N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2(n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3(n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4(n) AS (SELECT ROW_NUMBER() OVER(ORDER BY X.n)
FROM N3 AS X, N3 AS Y)
SELECT @RETURN_STRING=ISNULL(@RETURN_STRING,'') SUBSTRING(@String,Nums.n,1)
FROM N4 Nums
WHERE Nums.n <=LEN(@String) AND PATINDEX('%[0-9A-Za-z ]%',SUBSTRING(@String,Nums.n,1)) > 0
RETURN @RETURN_STRING
END
GO
SELECT dbo.[UDF_Extract_Alphanumeric_From_String] ('Hello ## World! Test8.?##') as [Result]
--OUTPUT
Result
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hello World Test8
(1 row affected)
Completion time: 2022-12-20T22:47:24.8872397 01:00
CodePudding user response:
Here's a different approach with a UDF...
CREATE FUNCTION LeaveValidChars
(
@p1 varchar(100)
)
RETURNS varchar(100)
AS
BEGIN
DECLARE @Result varchar(100)='', @p INT = 0, @c CHAR(1);
WHILE @p < LEN(@p1)
BEGIN
SET @c=substring(@p1, @p, 1)
IF CHARINDEX(@c,'ABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890?')>0
SET @result=@result @c;
SET @p=@p 1
END
RETURN @Result;
END
GO