I have the following table and I am trying to extract a string that only exists in certain columns and the string is between two different delimiters (, :).
df:
col1 | col2 |
---|---|
Patient 001 data retrieved: 9089800, John,Doe | CA |
Hospital stay | AZ |
Patient 002 data retrieved: 9123010, Steve,Doe | NY |
Patient 003 data retrieved: 9034291, Alex,Doe | MI |
Patient 004 information not found | VT |
df_final
col1 | col2 | result |
---|---|---|
Patient 001 data retrieved: 9089800, John,Doe | CA | 9089800 |
Hospital stay | AZ | |
Patient 002 data retrieved: 9123010, Steve,Doe | NY | 9123010 |
Patient 003 data retrieved: 9034291, Alex,Doe | MI | 9034291 |
Patient 004 information not found | VT |
I understand that the way the data is currently is not efficient but this is the dataset/task I have been given. Is there anyway to work around this?
his is what I have so far but it just retrieves the entire string for all rows. Not sure what I am doing wrong.
SELECT TOP 100 *,
SUBSTRING(col1,CHARINDEX('data retrieved:',col1) 1,
(((LEN(col1))-CHARINDEX(',', REVERSE(col1)))-CHARINDEX('data retrieved:',col1))) AS Result
FROM df
CodePudding user response:
A little bit more bullet proof:
trim(case when charindex(':', col1) <> 0 then
case when charindex(',', col1, charindex(':', col1) 1) <> 0 then
substring(col1, charindex(':', col1) 1,
charindex(',', col1, charindex(':', col1) 1) -
charindex(':', col1) - 1
)
end
end)
CodePudding user response:
Tired of extracting strings ... left, right, patindex, charindex, ...
Here is an option that uses a helper function which accepts two unlike delimeters. In this case a :
and ,
Example
Declare @YourTable Table ([col1] varchar(50),[col2] varchar(50)) Insert Into @YourTable Values
('Patient 001 data retrieved: 9089800, John,Doe','CA')
,('Hospital stay','AZ')
,('Patient 002 data retrieved: 9123010, Steve,Doe','NY')
,('Patient 003 data retrieved: 9034291, Alex,Doe','MI')
,('Patient 004 information not found','VT')
Select A.*
,Result = B.RetVal
From @YourTable A
Outer Apply [dbo].[tvf-Str-Extract-JSON](Col1,':',',') B
Results
The Function if Interested
CREATE FUNCTION [dbo].[tvf-Str-Extract-JSON] (@String nvarchar(max),@Delim1 nvarchar(100),@Delim2 nvarchar(100))
Returns Table
As
Return (
Select RetSeq = row_number() over (order by RetSeq)
,RetVal = left(RetVal,charindex(@Delim2,RetVal)-1)
From (
Select RetSeq = [Key] 1
,RetVal = trim(Value)
From OpenJSON( N'["' replace(string_escape(@String,'json'),@Delim1,'","') N'"]' )
) C1
Where charindex(@Delim2,RetVal)>1
)
If you want to try without the TVF https://dbfiddle.uk/Aw9qByxC