Home > Enterprise >  SQL: How to parse a string only where certain character patterns exist
SQL: How to parse a string only where certain character patterns exist

Time:09-17

I'm aware of using various functions like CHARINDEX and PATINDEX, but only for a single "spot" in a string. What I'm needing to do is parse out the characters of a string that exist between a < and >. For example:

<ul><li>Good afternoon,<br /><br />Please see the NDC number listed on the attached delivery ticket.<br /><br />Best regards,<br /><br />Rebecca Lawrence<br />Clinical Pharmacist<br /> <br /> <br /><br />From: Merino,Jason L <[email protected]> <br />Sent: Monday, September 12, 2022 1:17 PM<br />To: YYY <[email protected]><br />Subject: NDC number<br /><br /><br />Hello! Could I please get the NDC number for Lewis Trantham's ID number 24628 medication? I just received it and it is not listed.  <br /> <br /> <br />Thank you!<br />Jason Merino<br />Office Coordinator<br />Specialty Clinic of City<br />(555) 555-5555</li></ul>

So it could be <br>, <br />, <[email_address]>, etc. Given the range of possibility of text that could exist here, I have no idea how to go about removing these bits of characters. As stated, I'm only aware of how to do this when the string is much more predictable. And this is stored as a VARCHAR by the way.

CodePudding user response:

I am assuming you want the actual text.

Example

Declare @S varchar(max) = '<ul><li>Good afternoon,<br /><br />Please see the NDC number listed on the attached delivery ticket.<br /><br />Best regards,<br /><br />Rebecca Lawrence<br />Clinical Pharmacist<br /> <br /> <br /><br />From: Merino,Jason L <[email protected]> <br />Sent: Monday, September 12, 2022 1:17 PM<br />To: YYY <[email protected]><br />Subject: NDC number<br /><br /><br />Hello! Could I please get the NDC number for Lewis Trantham''s ID number 24628 medication? I just received it and it is not listed.  <br /> <br /> <br />Thank you!<br />Jason Merino<br />Office Coordinator<br />Specialty Clinic of City<br />(555) 555-5555</li></ul>'

Select * From [dbo].[tvf-Str-Extract](@S,'>','<')

Results

RetSeq  RetVal
1       Good afternoon,
2       Please see the NDC number listed on the attached delivery ticket.
3       Best regards,
4       Rebecca Lawrence
5       Clinical Pharmacist
6       From: Merino,Jason L 
7       Sent: Monday, September 12, 2022 1:17 PM
8       To: YYY 
9       Subject: NDC number
10      Hello! Could I please get the NDC number for Lewis Trantham's ID number 24628 medication? I just received it and it is not listed.  
11      Thank you!
12      Jason Merino
13      Office Coordinator
14      Specialty Clinic of City
15      (555) 555-5555 

The TVF if Intrested

CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delim1 varchar(100),@Delim2 varchar(100))
Returns Table 
As
Return (  

    Select RetSeq = row_number() over (order by RetSeq)
          ,RetVal = left(RetVal,charindex(@Delim2,RetVal)-1)
    From  (
            Select RetSeq = row_number() over (order by 1/0)
                  ,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
            From  ( values (convert(xml,'<x>'   replace((Select replace(@String,@Delim1,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>') '</x>').query('.'))) as A(XMLData)
            Cross Apply XMLData.nodes('x') AS B(i)
          ) C1
    Where charindex(@Delim2,RetVal)>1

)

Update ... One Line

Select string_agg(RetVal,' ') within group (order by retseq)  
  From [dbo].[tvf-Str-Extract](@S,'>','<')
  • Related