Home > Software design >  How to extract date from a text field using TSQL for SQL Server
How to extract date from a text field using TSQL for SQL Server

Time:03-03

I'm trying to extract date from a text field. I'm using PADINDEX to do this. My results are very inconsistent. Is there anyway I can do this through SQL. I don't have an application to do this for me. I am trying to get this for a report that is needed.

The text field has dates that are not always listed as MM/DD/YYYY sometimes its listed as M/DD/YYYY. Also,

Here is the query I am using:

select
substring(ar.finding_text,patindex('%[0-9]%/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',ar.finding_text),10)

FROM [ARKPPDB].[PowerPath].[dbo].[accession_2] a
LEFT OUTER JOIN acc_results ar on a.id = ar.acc_id

CodePudding user response:

You could add a pattern "priority" like this:

DECLARE @Patterns TABLE(Pattern VARCHAR(50),PatternLength INT)

INSERT INTO @Patterns(Pattern, PatternLength)
VALUES('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9] AM%',19),
    ('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9] PM%',19),
    ('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9]:[0-9][0-9] AM%',18),
    ('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9]:[0-9][0-9] PM%',18),
    ('%[0-9][0-9]/[0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9] AM%',18),
    ('%[0-9][0-9]/[0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9] PM%',18),
    ('%[0-9][0-9]/[0-9]/[0-9][0-9][0-9][0-9] [0-9]:[0-9][0-9] AM%',17),
    ('%[0-9][0-9]/[0-9]/[0-9][0-9][0-9][0-9] [0-9]:[0-9][0-9] PM%',17),
    ('%[0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9] AM%',18),
    ('%[0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9] PM%',18),
    ('%[0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9]:[0-9][0-9] AM%',17),
    ('%[0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9]:[0-9][0-9] PM%',17),
    ('%[0-9]/[0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9] AM%',17),
    ('%[0-9]/[0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9] PM%',17),
    ('%[0-9]/[0-9]/[0-9][0-9][0-9][0-9] [0-9]:[0-9][0-9] AM%',16),
    ('%[0-9]/[0-9]/[0-9][0-9][0-9][0-9] [0-9]:[0-9][0-9] PM%',16)

SELECT IIF(pat1.Pattern is null, null,substring(replace(UCASE(ar.finding_text),' AT ',' '),patindex(pat1.Pattern,replace(UCASE(ar.finding_text),' AT ',' ')),pat1.PatternLength))
FROM [ARKPPDB].[PowerPath].[dbo].[accession_2] a
LEFT OUTER JOIN acc_results ar on a.id = ar.acc_id
LEFT OUTER JOIN @Patterns pat1 on patindex(pat1.Pattern,ar.finding_text) > 0
LEFT OUTER JOIN @Patterns pat2 on patindex(pat2.Pattern,ar.finding_text) > 0 and pat2.PatternLength > pat1.PatternLength
WHERE pat2.Pattern IS NULL

So, some example text:

Physician/Physician’s office called on 2/1/2022 at 3:27 PM Central

This should return '2/1/2022 3:27 PM' which you should be able to convert to a DATETIME. I did not bother with the Central. Every example you showed included Central, so you can probably just assume it is Central time.

CodePudding user response:

I got a little more information about this post from another thread including a hint of what the data actually looks like. Here's the same code part of the response I created including the test data if someone else wants to play.

   DROP TABLE IF EXISTS #TestTable;
GO
--===== Create and populate the test table from the data provided.
     -- This is NOT a part of the solution. We're just creating test data here.
 SELECT *
   INTO #TestTable
   FROM (VALUES
         ('Physician/Physician’s office called on 2/1/2022 at 3:27 PM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 3:34 PM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 2:47 PM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 4:17 PM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 2:52 PM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 2:51 PM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 4:17 PM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 4:34 PM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 11:49 PM Eastern.')
        ,('Physician/Physician’s office called on 2/1/2022 at 11:27 AM Eastern.')
        ,('Physician/Physician’s office called on 2/1/2022 at 11:34 AM Eastern.')
        ,('Physician/Physician’s office called on 2/1/2022 at 10:47 AM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 9:17 AM Mountain.')
        ,('Physician/Physician’s office called on 2/1/2022 at 10:52 AM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 10:51 AM Central.')
        ,('Physician/Physician’s office called on 2/1/2022 at 9:17 AM Pacific.')
        ,('Physician/Physician’s office called on 2/1/2022 at 9:34 AM Pacific.')
        ,('Physician/Physician’s office called on 2/1/2022 at 11:49 AM Somewhere over the rainbow.')
        )v(SomeString)
;
--===== Let's see what we've got
 SELECT * FROM #TestTable
;
--===== Solve the given problem.
     -- So long as the following pattern is true in the source data, this should always work.
     --     WhoCalled by the string ' called on ' to determine the caller followed by...
     --     by a date string followed by the string ' at ' followed by a meridian time (Has AM/PM) for the CallDT followed by...
     --     a time zone name string for the TimeZone.
     -- The result is available as a DATETIME2(0) with no decimal seconds.
 SELECT SomeString
        ,WhoCalled  = LEFT(SomeString,ca1.CalledOn-1)
        ,CallDT     = CONVERT(DATETIME2(0),REPLACE(SUBSTRING(SomeString,ca1.CalledOn 10,ca2.Meridian-CalledOn-9),'at',''))
        ,TimeZone   = TRIM(' .' FROM SUBSTRING(SomeString,ca2.Meridian 2,500))  
   FROM #TestTable
  CROSS APPLY (VALUES(CHARINDEX(' called on ',SomeString)))ca1(CalledOn)
  CROSS APPLY (VALUES(PATINDEX('% [AP]M %',SomeString) 2))ca2(Meridian)
;
GO
  • Related