Home > OS >  Extracting string that only exists in certain rows and is between 2 different delimiters
Extracting string that only exists in certain rows and is between 2 different delimiters

Time:01-10

I have the following table and I am trying to extract a string that only exists in certain columns and the string is between two different delimiters (, :).

df:

col1 col2
Patient 001 data retrieved: 9089800, John,Doe CA
Hospital stay AZ
Patient 002 data retrieved: 9123010, Steve,Doe NY
Patient 003 data retrieved: 9034291, Alex,Doe MI
Patient 004 information not found VT

df_final

col1 col2 result
Patient 001 data retrieved: 9089800, John,Doe CA 9089800
Hospital stay AZ
Patient 002 data retrieved: 9123010, Steve,Doe NY 9123010
Patient 003 data retrieved: 9034291, Alex,Doe MI 9034291
Patient 004 information not found VT

I understand that the way the data is currently is not efficient but this is the dataset/task I have been given. Is there anyway to work around this?

his is what I have so far but it just retrieves the entire string for all rows. Not sure what I am doing wrong.

SELECT TOP 100 *, 
SUBSTRING(col1,CHARINDEX('data retrieved:',col1) 1,
        (((LEN(col1))-CHARINDEX(',', REVERSE(col1)))-CHARINDEX('data retrieved:',col1))) AS Result 
FROM df

CodePudding user response:

A little bit more bullet proof:

trim(case when charindex(':', col1) <> 0 then
      case when charindex(',', col1, charindex(':', col1) 1) <> 0 then
          substring(col1, charindex(':', col1) 1, 
              charindex(',', col1, charindex(':', col1) 1) -
              charindex(':', col1) - 1
          ) 
      end
  end)

CodePudding user response:

Tired of extracting strings ... left, right, patindex, charindex, ...

Here is an option that uses a helper function which accepts two unlike delimeters. In this case a : and ,

Example

Declare @YourTable Table ([col1] varchar(50),[col2] varchar(50))  Insert Into @YourTable Values 
 ('Patient 001 data retrieved: 9089800, John,Doe','CA')
,('Hospital stay','AZ')
,('Patient 002 data retrieved: 9123010, Steve,Doe','NY')
,('Patient 003 data retrieved: 9034291, Alex,Doe','MI')
,('Patient 004 information not found','VT')
 
Select A.* 
      ,Result = B.RetVal
 From  @YourTable A
 Outer Apply [dbo].[tvf-Str-Extract-JSON](Col1,':',',') B

Results

enter image description here

The Function if Interested

CREATE FUNCTION [dbo].[tvf-Str-Extract-JSON] (@String nvarchar(max),@Delim1 nvarchar(100),@Delim2 nvarchar(100))
Returns Table 
As
Return (  

    Select RetSeq = row_number() over (order by RetSeq)
          ,RetVal = left(RetVal,charindex(@Delim2,RetVal)-1)
    From  (
            Select RetSeq = [Key] 1
                  ,RetVal = trim(Value)
             From  OpenJSON( N'["' replace(string_escape(@String,'json'),@Delim1,'","') N'"]' )

          ) C1
    Where charindex(@Delim2,RetVal)>1

)

If you want to try without the TVF https://dbfiddle.uk/Aw9qByxC

  • Related