Home > Software design >  How to Detect Question Mark Invalid Character in SQL
How to Detect Question Mark Invalid Character in SQL

Time:10-02

I am working in a database that accepts imported files. When the client enters a registered trademark, copyright, or another invalid symbol, the database imports the symbol as an invalid character in the form of a question mark, like the following: enter image description here

lorem ipsum dolor sit amet, consectetur � lorem ipsum dolor sit amet, consectetur

When printing this character, it appears as such: enter image description here

lorem ipsum dolor sit amet, consectetur ? lorem ipsum dolor sit amet, consectetur

Is there a way to detect that symbol, as using a like statement doesn't detect the symbol.

The desired result is to be able to send a warning in a stored procedure that asks the user to check the inserted data to ensure validity.

Note: It is not enough to insert the string into a temp table and then check the temp table for question marks, as a question mark in the string is not uncommon and would create for more false positives than helpful alerts.

Thank you

CodePudding user response:

That special character is NCHAR(65533) but evades normal pattern matching using LIKE, CHARINDEX, PATINDEX, etc. I did find one way to detect it using TRANSLATE, by swapping the Unicode replacement character for a different Unicode character that can't possibly be in the data already. I picked an 8-pointed star (, NCHAR(10037)) but there are so many to choose from...

CREATE TABLE dbo.whatever(things nvarchar(32));

INSERT dbo.whatever(things) VALUES
  (N'this row is just fine.'),
  (N'well, here there is a � rhombus.'),
  (N'this row is just fine too.');

SELECT things
  FROM dbo.whatever
  WHERE TRANSLATE(things, nchar(65533), N'✵') LIKE N'%✵%';

Output:

well, here there is a � rhombus.

Also note the difference between print 'hi � there'; and print N'hi � there'; - don't be lazy, if your string is (or could contain) Unicode, always use the N'prefix'.

As Martin suggests, though, SQL Server can store whatever character is leading to the � - it is most likely because you are treating the file as ASCII, inserting them into a varchar column, or it is getting lost somewhere else along the way.

  • Related