Getting "?" character in front of certain strings when using SSIS to convert view nvarchar-CodePudding

I am using SSIS to convert and move data from a SQL Server view made up of various tables with lots of joins. All data is originally from Sap B1 pulled into my required tables made up of nvarchar data then converted to varchar data as that is what my destination needs.

I regularly pull data directly from SAP into SQL Server and this has not been a problem until recently (I pull this and more data regularly and no other data is giving me this trouble only the following so far and would like to get to the bottom of it so I can understand why it is happening).

SSIS Flow is literally like this (AllQuotes_View) > (Data Convertion) converting from nvarchar to "String [DT_STR]" 100 length, codepage (1252 ANSI - Latin I) for the column in question" > (AllQuotes_Table)

Destination table create code


SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[AllQuotes_Table]
(
    [quote_no] [int] NULL,
    [quote_prepared_for] [varchar](50) NULL,
    [quote_customer] [varchar](100) NULL,
    [quote_ref] [varchar](100) NULL,
    [quote_prepared_by] [varchar](155) NULL,
    [quote_date] [varchar](30) NULL,
    [quote_currency] [varchar](30) NULL,
    [quote_currency_letters] [varchar](50) NULL,
    [quote_total] [numeric](20, 2) NULL,
    [product_qty] [numeric](20, 0) NULL,
    [product_sku] [varchar](50) NULL,
    [product_description] [varchar](250) NULL,
    [product_stock] [varchar](100) NULL,
    [product_unit] [numeric](21, 6) NULL,
    [product_total] [numeric](38, 7) NULL,
    [LineNum] [int] NULL,
    [Address] [varchar](500) NULL,
    [U_EmailCont1] [varchar](250) NULL
) ON [PRIMARY]
GO

This has worked before for over a year successfully in the past but lately, some of the data in column product_sku have a "?" added in front of the expected string. E.G. when moving a row containing product_sku '‎5HB12A' from the (AllQuotes_View) through the data conversion shown above to the (AllQuotes_Table) the data when doing a select on the table comes out as '?5HB12A'

To note when viewing the data in the view in the designer or using a select statement from the view in Management studio before converting and moving to my destination table it doesn't have the leading "?".

Can anyone shed any light on this, I have tried googling but it is a hard subject to put into short search terms.

I can just run the code below on the destination table to remove the "?" character from the start of the string.

UPDATE AllQuotes_Table 
SET product_sku = RIGHT(product_sku, LEN(product_sku) - 1) 
WHERE product_sku LIKE '?%';

However, this is a quick fix and is not getting to the root of the problem and I wish to find out why this is actually happening.

I have run a select on all columns to see if the same "?" is being added to any other columns but it is not.

This has run successfully over 1000's quotes and product_sku's is the only column that seems to be affected and the only data that this has happened on so far are the following strings

?RDR-80582AKU
?4ZB97A#B19
?5HB12A
?EG-2000 IT
?EG-2000-01

Can anyone suggest anything to try to find out why this is suddenly happening or a fix to my problem? without the replacement code as this seems like a bodge.

Thank you for any suggestions for a fix or a guide on anywhere to start looking.

CodePudding user response：

It means some unprintable character is coming in the original product_sku, which is not able to converted to VARCHAR datatype. I would suggest you to do the below, as part of the data pull.

SQL SERVER 2017

TRIM(product_sku)

Earlier versions of SQL SERVER 2017

LTRIM(RTRIM(product_sku))

CodePudding user response：

Why this is happening?

As you mentioned you are importing data from a Unicode column (nvarchar) to a non-Unicode column (varchar). The ? is added since there is a Unicode character that cannot be converted to the 1252 ANSI - Latin I code page.

What is the Unicode character?

To check for this character, temporarily add a derived column with the following expression:

CODEPOINT([column]

The CODEPOINT function is used to:

Returns the Unicode code point of the leftmost character of a character expression.

Then, add a data viewer to check the values generated.

Once you know the Unicode value, you can check it from the Unicode Table

How to remove it?

To remove this character from the beginning of the string value, you can use the following derived column expression (assuming that the Unicode character code is 10):

CODEPOINT([column]) == 10 ? SUBSTRING([column],2,LEN([column]) - 1) : [column]