I am working in a database that accepts imported files. When the client enters a registered trademark, copyright, or another invalid symbol, the database imports the symbol as an invalid character in the form of a question mark, like the following:
lorem ipsum dolor sit amet, consectetur � lorem ipsum dolor sit amet, consectetur
When printing this character, it appears as such:
lorem ipsum dolor sit amet, consectetur ? lorem ipsum dolor sit amet, consectetur
Is there a way to detect that symbol, as using a like statement doesn't detect the symbol.
The desired result is to be able to send a warning in a stored procedure that asks the user to check the inserted data to ensure validity.
Note: It is not enough to insert the string into a temp table and then check the temp table for question marks, as a question mark in the string is not uncommon and would create for more false positives than helpful alerts.
Thank you
CodePudding user response:
That special character is NCHAR(65533)
but evades normal pattern matching using LIKE
, CHARINDEX
, PATINDEX
, etc. I did find one way to detect it using TRANSLATE
, by swapping the Unicode replacement character for a different Unicode character that can't possibly be in the data already. I picked an 8-pointed star (✵
, NCHAR(10037)
) but there are so many to choose from...
CREATE TABLE dbo.whatever(things nvarchar(32));
INSERT dbo.whatever(things) VALUES
(N'this row is just fine.'),
(N'well, here there is a � rhombus.'),
(N'this row is just fine too.');
SELECT things
FROM dbo.whatever
WHERE TRANSLATE(things, nchar(65533), N'✵') LIKE N'%✵%';
Output:
well, here there is a � rhombus.
Also note the difference between print 'hi � there';
and print N'hi � there';
- don't be lazy, if your string is (or could contain) Unicode, always use the N'prefix'
.
As Martin suggests, though, SQL Server can store whatever character is leading to the � - it is most likely because you are treating the file as ASCII, inserting them into a varchar column, or it is getting lost somewhere else along the way.