CHARINDEX does not return the correct index into the search string-CodePudding

I have the following data in a single element in a database table:

Port Jefferson Station, NY 11776-2598   
Northport, NY 11768                     
Garden City, NY 11530

I am trying to break each line up into a new table which contains a city field, a state field and a zip code field.

I am running a script with the following code to pick up the City value:

Create function [dbo].[fnBuildCityState](@AddressCity varchar(40))
RETURNS @NameTable TABLE
(
    City VARCHAR(255),
    State VARCHAR(255),
    Zip VARCHAR(255)
) 
as
BEGIN

-- Index and word variables.
DECLARE @AccountNameEndIndex INT;
DECLARE @FirstWordEndIndex   INT;
DECLARE @SecondWordEndIndex  INT;
DECLARE @ThirdWordEndIndex   INT;
DECLARE @FirstWord       NVARCHAR(8);
DECLARE @SecondWord      NVARCHAR(50);
DECLARE @ThirdWord       NVARCHAR(50);

SET @AddressCity = RTRIM(ISNULL(@AddressCity,'No Address, City ZipCode'))
if (LEN(@AddressCity)<3)
BEGIN
    SET @AddressCity = 'No Address, City ZipCode'
END

-- Get the last index.
SET @AccountNameEndIndex = LEN(@AddressCity) - 1;
-- Get the first word.
SET @FirstWordEndIndex = CHARINDEX(',', @AddressCity, 0);
SET @FirstWord         = SUBSTRING(@AddressCity, 0, @FirstWordEndIndex);



INSERT INTO @NameTable (City, State, Zip)
VALUES (@FirstWord, @SecondWord, @ThirdWord)

RETURN
END

When I run the code, and look at the resulting table, I see this output:

City                     State                Zip
Port Jef                 NULL                 NULL
Northpor                 NULL                 NULL
Garden C                 NULL                 NULL

The line of code which should return the index to the location of the comma in the input is

SET @FirstWordEndIndex = CHARINDEX(',', @AddressCity, 0);

It appears to always return a value of 8 regardless of where the comma is in the string.

Any idea why?

CodePudding user response：

The specific answer to your question is because you defined @FirstWord as NVARCHAR(8) so its only going to contain up to 8 chars.

But the bigger question is why are you trying trying to do that RBAR? SQL is set-based for a reason... you can build a query to do everything you need in one hit e.g.

SELECT b.City, C.[State], c.ZipCode
FROM (
    VALUES
    ('Port Jefferson Station, NY 11776-2598'),
    ('Northport, NY 11768'),
    ('Garden City, NY 11530')
) x ([Address])
CROSS APPLY (
    VALUES
    (CHARINDEX(',', x.[Address], 0))
) a (FirstComma)
CROSS APPLY (
    VALUES
    (SUBSTRING(x.[Address],1,a.FirstComma-1)
    , SUBSTRING(x.[Address],a.FirstComma 2,LEN(x.[Address])))
) b (City,SecondString)
CROSS APPLY (
    VALUES
    (SUBSTRING(b.SecondString,1,2)
    , SUBSTRING(b.SecondString,4,LEN(b.SecondString)))
) c ([State],ZipCode);

Returns:

City	State	ZipCode
Port Jefferson Station	NY	11776-2598
Northport	NY	11768
Garden City	NY	11530

This is simplistic and doesn't cover your edge cases such as no address, or a state with a length other then 2 characters. But thats all stuff you can easily sort out.

The point of using using CROSS APPLY is to avoid having to repeat the same calculations in multiple places - which a query without CROSS APPLY would need to do.

CodePudding user response：

Please try the following solution.

It is based on tokenization instead of string parsing.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(100));
INSERT @tbl (tokens) VALUES
('Port Jefferson Station, NY 11776-2598'),
('Northport, NY 11768'),   
('Garden City, NY 11530');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = ',';

SELECT t.*
    , city = c.value('(/root/r[1]/text())[1]', 'VARCHAR(50)')
    , state = LEFT(TRIM(c.value('(/root/r[2]/text())[1]', 'VARCHAR(50)')), 2)
    , zip = SUBSTRING(TRIM(c.value('(/root/r[2]/text())[1]', 'VARCHAR(50)')), 3, 50)
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
      REPLACE(tokens, @separator, ']]></r><r><![CDATA[')   
      ']]></r></root>' AS XML)) AS t1(c);

Output

ID	tokens	city	state	zip
1	Port Jefferson Station, NY 11776-2598	Port Jefferson Station	NY	11776-2598
2	Northport, NY 11768	Northport	NY	11768
3	Garden City, NY 11530	Garden City	NY	11530

CodePudding user response：

I tend to lean towards JSON when parsing strings

Example

Select City  = JSON_VALUE(JS1,'$[0]')
      ,State = JSON_VALUE(JS2,'$[0]')
      ,ZIP   = JSON_VALUE(JS2,'$[1]')
 From YourTable A
 Cross Apply (values ('["' replace(Address,',','","') '"]') ) B(JS1)
 Cross Apply (values ('["' replace(ltrim(JSON_VALUE(JS1,'$[1]')),' ','","') '"]') ) C(JS2)

Results

City                    State   ZIP
Port Jefferson Station  NY      11776-2598
Northport               NY      11768
Garden City             NY      11530

If you wanted to use as a TVF

Declare @AddressCity varchar(max) = 'Port Jefferson Station, NY 11776-2598'

Select City  = JSON_VALUE(JS1,'$[0]')
      ,State = JSON_VALUE(JS2,'$[0]')
      ,ZIP   = JSON_VALUE(JS2,'$[1]')
 From (values ('["' replace(@AddressCity,',','","') '"]') ) A(JS1)
 Cross Apply (values ('["' replace(ltrim(JSON_VALUE(JS1,'$[1]')),' ','","') '"]') ) B(JS2)