Home > Enterprise >  SQL Return only specific characters in SELECT from VARCHAR column
SQL Return only specific characters in SELECT from VARCHAR column

Time:12-21

I am stuck on an issue in SQL Server. I have a VARCHAR column called Name in my table:

enter image description here

I am trying to get the column to only return valid characters when doing a select on it. For example, I am only accepting any letters [A-Z], numbers [0-9] or a question mark [?] but list can change so need to be flexible. The reason why I am only accepting certain characters is due to our supplier specification which I send data to. It will break their system if I send then an invalid character.

SELECT Name FROM @table

For the purpose of asking the question, I have included a small example below where I insert into a table variable. My question is aimed towards the select part as I am trying to work on data already inserted.

DECLARE @table AS TABLE
(
    ID INT ,
    Name VARCHAR(500) ,
    Age INT 
)

INSERT INTO @table 
VALUES (1, 'Hello ## World! Test8.?##', 23), 
       (2, 'Need specific characters only Test8.? ]]', 22)

-- Only accept [A-Z][0-9][?]
SELECT Name FROM @table

Please note, the scenario above is a small example and the data is just dummy data I just added to make it easier to ask the question. The data already exist. I have no control over it. I only have access to it and need to tidy it up via doing a select.

Expected results with only returning valid characters:

enter image description here

For first row it will return "Hello World Test8?" and for second row it will return "Need specific chatacters only Test8?".

What I have tried so far is doing a replace on the select to get the result:

-- Only accept [A-Z][0-9][?]
SELECT REPLACE(REPLACE(REPLACE(REPLACE(Name, '#', ''), '!', ''), ']', ''), '.', '') FROM @table

However, this only works if I knew which characters are invalid. As mentioned earlier in question, I only know the opposite which are valid characters. A valid character is a letter [A-Z] or number [0-9] or a question mark. This means I have a massive list of invalid characters I need to add if I went towards a replace solution.

Any idea how I can achieve this within the select statement?

I am on SQL Server Version 2012.

CodePudding user response:

There is no built-in functionality for this, though this was implemented by people before:

https://raresql.com/2013/03/11/sql-server-function-to-parse-alphanumeric-characters-from-string/

Using this (all copyrights to the author) would be:

CREATE FUNCTION dbo.[UDF_Extract_Alphanumeric_From_String]
(
@String VARCHAR(MAX) -- Variable for string
)
RETURNS VARCHAR(MAX)
BEGIN
DECLARE @RETURN_STRING VARCHAR(MAX)
 
; WITH  N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2(n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3(n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4(n) AS (SELECT ROW_NUMBER() OVER(ORDER BY X.n)
FROM N3 AS X, N3 AS Y)
 
SELECT @RETURN_STRING=ISNULL(@RETURN_STRING,'')  SUBSTRING(@String,Nums.n,1)
FROM N4 Nums
WHERE Nums.n <=LEN(@String) AND PATINDEX('%[0-9A-Za-z ]%',SUBSTRING(@String,Nums.n,1)) > 0
 
RETURN @RETURN_STRING
END
 
GO
SELECT dbo.[UDF_Extract_Alphanumeric_From_String] ('Hello ## World! Test8.?##') as [Result]

--OUTPUT

Result
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hello  World Test8

(1 row affected)


Completion time: 2022-12-20T22:47:24.8872397 01:00

CodePudding user response:

Here's a different approach with a UDF...

CREATE FUNCTION LeaveValidChars 
(
    @p1 varchar(100)
)
RETURNS varchar(100)
AS
BEGIN
    DECLARE @Result varchar(100)='', @p INT = 0, @c CHAR(1);
    WHILE @p < LEN(@p1)
    BEGIN
        SET @c=substring(@p1, @p, 1)
        IF CHARINDEX(@c,'ABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890?')>0 
            SET @result=@result @c;
        SET @p=@p 1
    END
    RETURN @Result;
END
GO
  • Related