A column I'm trying to extract only the last name from has possible combinations like H.Naude
or H. Naude
or H Naude
and I need just the Naude
part. If I use the SUBSTRING
method I'm able to get the characters after either the space or the point but how to test for all 3 possibilities? Unfortunately this data comes from an imported entry form so I have no control as to how the data is formatted. Currently I have the following which is only for the space character
SUBSTRING(H.PtsNonFemale, CHARINDEX('' '', H.PtsNonFemale) 1, DATALENGTH(H.PtsNonFemale) - CHARINDEX('' '', H.PtsNonFemale) 1 ) AS Female
Any assistance is much appreciated.
CodePudding user response:
Assuming the data provided covers all use-cases then you can simply test for each case in descending length order of the of the component you want to remove and use the first match (longest) found in the substring.
SELECT H.PtsNonFemale
, SUBSTRING(H.PtsNonFemale, COALESCE(NULLIF(I.A,0) 6,NULLIF(I.B,0) 2,NULLIF(I.C,0) 1,NULLIF(I.D,0) 1), LEN(H.PtsNonFemale)) AS Female
FROM (
VALUES
('H.Naude'), ('H. Naude'), ('H Naude'), ('A. M. Someone')
) H (PtsNonFemale)
CROSS APPLY (
VALUES
(PATINDEX('_. _. %', H.PtsNonFemale), CHARINDEX('. ',H.PtsNonFemale,0), CHARINDEX('.',H.PtsNonFemale,0), CHARINDEX(' ',H.PtsNonFemale,0))
) I (A, B, C, D);
Returns:
PtsNonFemale | Female |
---|---|
H.Naude | Naude |
H. Naude | Naude |
H Naude | Naude |
A. M. Someone | Someone |
CodePudding user response:
It is much better to tokenize a string of tokens instead of parsing.
SQL Server XQuery allows to that very easily.
No need to call barrage of string functions: SUBSTRING(), COALESCE(), NULLIF(), LEN(), PATINDEX(), CHARINDEX()
, etc. multiple times.
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, PtsNonFemale VARCHAR(100));
INSERT INTO @tbl (PtsNonFemale) VALUES
('H.Naude'),
('H. Naude'),
('H Naude'),
('A. M. Someone');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = '.';
SELECT t.*
, c.value('(/root/r[last()]/text())[1]', 'VARCHAR(30)') AS Female
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['
REPLACE(REPLACE(PtsNonFemale,SPACE(1),@separator), @separator, ']]></r><r><![CDATA[')
']]></r></root>' AS XML)) AS t1(c);
Output
---- --------------- ---------
| ID | PtsNonFemale | Female |
---- --------------- ---------
| 1 | H.Naude | Naude |
| 2 | H. Naude | Naude |
| 3 | H Naude | Naude |
| 4 | A. M. Someone | Someone |
---- --------------- ---------
CodePudding user response:
Use string_split()
:
select
PtsNonFemale,
(select top 1 value
from string_split(PtsNonFemale, ' ', 1)
order by ordinal desc) as Female
from mytable