For SQL Server, let's say I have the following string:
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris)
I would like to get both the new first name and last name. SUBSTRING() would not be the proper way since there's no pattern on how many chars the new name has.
Thanks in advance.
CodePudding user response:
It would be best if you normalize your design.
However, with your current dataset following this pattern, you may use the following approach that uses CHARINDEX
in a recursive CTE and SUBSTRING
to extract the desired names.
WITH sample_data as (
select 'FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris)' as col
),
name_indexes AS (
SELECT
col,
CHARINDEX('(',col) as index_start ,
CHARINDEX(')',col) as index_end,
0 as iteration
FROM
sample_data
UNION ALL
SELECT
col,
CHARINDEX('(',col,index_start 1) as index_start ,
CHARINDEX(')',col,index_end 1) as index_end,
iteration 1 as iteration
FROM
name_indexes
WHERE
iteration < 3
),
name_pieces AS (
SELECT
*,
SUBSTRING(col,index_start 1,index_end - index_start-1) as name
FROM
name_indexes
),
name_changes AS (
SELECT
col,
MAX(CASE WHEN iteration=0 THEN name END) as original_first_name,
MAX(CASE WHEN iteration=1 THEN name END) as new_first_name,
MAX(CASE WHEN iteration=2 THEN name END) as original_last_name,
MAX(CASE WHEN iteration=3 THEN name END) as new_last_name
FROM
name_pieces
GROUP BY
col
)
SELECT *
FROM name_changes
col | original_first_name | new_first_name | original_last_name | new_last_name |
---|---|---|---|---|
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | Joe | Sally | Doe | Harris |
Working demo db<>fiddle here
The first CTE sample_data
with column col
is only for demonstration purposes as I try to replicate your table. You may omit this CTE and replace sample_data
with your table name and col
with your column storing the string to extract the changes in the query below:
WITH name_indexes AS (
SELECT
col,
CHARINDEX('(',col) as index_start ,
CHARINDEX(')',col) as index_end,
0 as iteration
FROM
sample_data
UNION ALL
SELECT
col,
CHARINDEX('(',col,index_start 1) as index_start ,
CHARINDEX(')',col,index_end 1) as index_end,
iteration 1 as iteration
FROM
name_indexes
WHERE
iteration < 3
),
name_pieces AS (
SELECT
*,
SUBSTRING(col,index_start 1,index_end - index_start-1) as name
FROM
name_indexes
),
name_changes AS (
SELECT
col,
MAX(CASE WHEN iteration=0 THEN name END) as original_first_name,
MAX(CASE WHEN iteration=1 THEN name END) as new_first_name,
MAX(CASE WHEN iteration=2 THEN name END) as original_last_name,
MAX(CASE WHEN iteration=3 THEN name END) as new_last_name
FROM
name_pieces
GROUP BY
col
)
SELECT *
FROM name_changes
The CTE name_indexes
is a recursive CTE aiming to extract the indexes of (
and )
to identify the start and end of each name change following the pattern in the string. iteration
is used to track the number of changes and this recursive CTE continues while there are less than 3 iterations. The output looks like for this sample string:
col | index_start | index_end | iteration |
---|---|---|---|
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | 24 | 28 | 0 |
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | 33 | 39 | 1 |
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | 64 | 68 | 2 |
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | 73 | 80 | 3 |
The CTE name_pieces
uses the SUBSTRING
function to extract the names from the string based on the indexes. The output of this CTE is
col | index_start | index_end | iteration | name |
---|---|---|---|---|
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | 24 | 28 | 0 | Joe |
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | 33 | 39 | 1 | Sally |
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | 64 | 68 | 2 | Doe |
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | 73 | 80 | 3 | Harris |
Finally, the CTE name_changes
uses a group by and max function with case expressions to consolidate these changes into 1 row per string in the original dataset. You may choose your desired columns from this final projection.
col | original_first_name | new_first_name | original_last_name | new_last_name |
---|---|---|---|---|
FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris) | Joe | Sally | Doe | Harris |
Let me know if this works for you.
CodePudding user response:
Others have already noted that if you are storing that as data in some column in some table then that's not a great schema. But maybe you just have a varchar variable you're trying to parse... your question doesn't actually specify that this data is in some column in some table.
Here is a somewhat "imperative" solution using cross apply. Cross apply is nice in situations like this, because it lets you compose the logic from smaller parts in a way that is easy to read. Is it the "best" way? That depends upon what you mean by best. I would argue it is very easy to understand and therefore maintain.
declare @str varchar(1000)
= 'FirstName changed from (Joe) to (Sally), LastName changed from (Doe) to (Harris)';
select originalString = t.string,
newFirstName = substring
(
t.string,
newFirstStart.pos,
newFirstEnd.pos - newFirstStart.pos
),
newLastName = substring
(
t.string,
newLastStart.pos,
len(t.string) - newLastStart.pos
)
from (values (@str)) t(string)
cross apply (select charindex('to (', string, 0) 4) newFirstStart(pos)
cross apply (select charindex(')', string, newFirstStart.pos)) newFirstEnd(pos)
cross apply (select charindex('to (', string, newFirstEnd.pos) 4) newLastStart(pos);
The question is really how robust you need your solution to be. This answer assumes that your input string always has the parentheses around the names, and the rest of the string is invariant. If that's not exactly true, similar solutions may exist using more convoluted charindex
constructions, or patindex
, or even splitting the string into words (using one of the many available TSQL string split implementations, or the built in string_split
in SQL Server 2016 .