Home > Mobile >  SUBSTRING with multiple CHARINDEX options in query
SUBSTRING with multiple CHARINDEX options in query

Time:02-15

A column I'm trying to extract only the last name from has possible combinations like H.Naude or H. Naude or H Naude and I need just the Naude part. If I use the SUBSTRING method I'm able to get the characters after either the space or the point but how to test for all 3 possibilities? Unfortunately this data comes from an imported entry form so I have no control as to how the data is formatted. Currently I have the following which is only for the space character

SUBSTRING(H.PtsNonFemale, CHARINDEX('' '', H.PtsNonFemale)  1, DATALENGTH(H.PtsNonFemale) - CHARINDEX('' '', H.PtsNonFemale)  1 ) AS Female

Any assistance is much appreciated.

CodePudding user response:

Assuming the data provided covers all use-cases then you can simply test for each case in descending length order of the of the component you want to remove and use the first match (longest) found in the substring.

SELECT H.PtsNonFemale
    , SUBSTRING(H.PtsNonFemale, COALESCE(NULLIF(I.A,0) 6,NULLIF(I.B,0) 2,NULLIF(I.C,0) 1,NULLIF(I.D,0) 1), LEN(H.PtsNonFemale)) AS Female
FROM (
    VALUES
    ('H.Naude'), ('H. Naude'), ('H Naude'), ('A. M. Someone')
) H (PtsNonFemale)
CROSS APPLY (
    VALUES
    (PATINDEX('_. _. %', H.PtsNonFemale), CHARINDEX('. ',H.PtsNonFemale,0), CHARINDEX('.',H.PtsNonFemale,0), CHARINDEX(' ',H.PtsNonFemale,0))
) I (A, B, C, D);

Returns:

PtsNonFemale Female
H.Naude Naude
H. Naude Naude
H Naude Naude
A. M. Someone Someone

CodePudding user response:

It is much better to tokenize a string of tokens instead of parsing.

SQL Server XQuery allows to that very easily.

No need to call barrage of string functions: SUBSTRING(), COALESCE(), NULLIF(), LEN(), PATINDEX(), CHARINDEX(), etc. multiple times.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, PtsNonFemale VARCHAR(100));
INSERT INTO @tbl (PtsNonFemale) VALUES
('H.Naude'),
('H. Naude'),
('H Naude'),
('A. M. Someone');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = '.';

SELECT t.*
    , c.value('(/root/r[last()]/text())[1]', 'VARCHAR(30)') AS Female
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
        REPLACE(REPLACE(PtsNonFemale,SPACE(1),@separator), @separator, ']]></r><r><![CDATA[')   
        ']]></r></root>' AS XML)) AS t1(c);

Output

 ---- --------------- --------- 
| ID | PtsNonFemale  | Female  |
 ---- --------------- --------- 
|  1 | H.Naude       | Naude   |
|  2 | H. Naude      | Naude   |
|  3 | H Naude       | Naude   |
|  4 | A. M. Someone | Someone |
 ---- --------------- --------- 

CodePudding user response:

Use string_split():

select
  PtsNonFemale,
  (select top 1 value
   from string_split(PtsNonFemale, ' ', 1)
   order by ordinal desc) as Female
from mytable
  • Related