Home > Software design >  T-SQL function to extract dates
T-SQL function to extract dates

Time:03-01

I have created a T-SQL function that is able to extract dates from strings.

The function works when the date has no ..

Please help get the missing piece to make this function work with dates that have periods.

Dates with periods come in 2 different flavors:

MM.DD.YYYY 
M.DD.YY

My function code:

IF OBJECT_ID(N'tempdb..#fileNameTable', N'U') IS NOT NULL 
    DROP TABLE #fileNameTable

CREATE TABLE #fileNameTable
(
    [fName] [VARCHAR](250) NULL
)

INSERT INTO #fileNameTable (fName) 
    SELECT '9999999991_Agent Name_08.02.2018.WAV'

INSERT INTO #fileNameTable (fName) 
    SELECT '9999999999 - Internal ID 1446683 (Pedro) 6.26.17 WB.mp3'

INSERT INTO #fileNameTable (fName) 
    SELECT '9999999998 - Internal ID 1464807 (John) 7.11.17.mp3'

INSERT INTO #fileNameTable (fName) 
    SELECT '9999999997 - Internal ID 1447503 (Marta) 6.27.17.mp3'

INSERT INTO #fileNameTable (fName) 
    SELECT '9999999996 - Internal ID 1437403 (Ruby) 6.20.17.mp3'

INSERT INTO #fileNameTable (fName) 
    SELECT 'rc_20200817_1612_9999999995_NJ.wav'

INSERT INTO #fileNameTable (fName) 
    SELECT 'rc_20200817_1543_9999999994_PA.wav'

INSERT INTO #fileNameTable (fName) 
    SELECT 'rc_20200817_1211_9999999993_MA.wav'

INSERT INTO #fileNameTable (fName) 
    SELECT 'rc_20200817_1211_9999999992_MD.wav'


SELECT *, [dbo].[ExtractDateFromFileName](fName) 
FROM #fileNameTable

CREATE FUNCTION [dbo].[ExtractDateFromFileName]
    (@str VARCHAR(256))
RETURNS VARCHAR(256)
AS
BEGIN
    DECLARE @validchars VARCHAR(MAX)
    SET @validchars = '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

    DECLARE @idx INT
    SET @idx = PATINDEX('%'  @validchars  '%',@str)

    IF @idx > 0 AND (@idx = LEN(@str) - 8
                     OR PATINDEX(SUBSTRING(@str, @idx   9, 1), '[0-9]') = 0)
        SET @str = SUBSTRING(@str, PATINDEX('%'   @validchars  '%', @str), 8)
    ELSE 
        SET @str = ''

    RETURN @str
END

CodePudding user response:

Another approach:

CREATE FUNCTION dbo.ExtractDateFromGarbage ( @garbage varchar(250))
RETURNS TABLE WITH SCHEMABINDING
AS
  RETURN 
  (
    SELECT theDate = TRY_CONVERT(date, 
        x   REPLACE(LEFT(y, PATINDEX('%[^0-9]%', y '.')-1),'.','/'))
      FROM (SELECT g = @garbage) AS f 
    CROSS APPLY (VALUES (PATINDEX('%[01 ][0-9].%[0-9].[0-9][0-9]%', g),
       PATINDEX('%2[0-9][0-9][0-9][0-1][0-9][0-3][0-9]%', g))) AS v(s,t) 
    CROSS APPLY (VALUES (SUBSTRING(g, COALESCE(NULLIF(s,0),t), 250))) AS q(q) 
    CROSS APPLY (VALUES(LEFT(q,6), SUBSTRING(q,7,250))) AS z(x,y)
  );

It handles one edge case Stu's doesn't (mm.d.yy):

The moral of the story is still that you should be doing this somewhere else (and storing proper dates in the right data type once the data makes its way to SQL Server).

CodePudding user response:

I would suggest you implement this as a table-valued function, I had a go at rewriting your function to work for both types of dates in your sample data. It may still require some tweaking for edge-cases but should be mostly there.

Testing with 1.3 million rows (of your sample data, duplicated with a cartesian join) on my desktop PC, your current scalar-valued function has the following statistics time data:

SQL Server Execution Times: CPU time = 19886 ms, elapsed time = 19961 ms.

With the function below the same 1.3 million rows has the following times:

SQL Server Execution Times: CPU time = 7122 ms, elapsed time = 7296 ms.

The table-valued function is much faster because SQL Server is able to inline it into the query, rather than running a separate function stand-alone per row, ie, RBAR.

create or alter function [dbo].[ExtractDateFromFileName2](@str varchar(256))
returns table
as return
select coalesce(
    Try_Convert(date,Substring(string, d1.v,8),111), 
    Try_Convert(date,Substring(string, d3.v,10),104),
    Try_Convert(date,Substring(string, d2.v,7),1)
    ) [Value]
from (select @str string)s
outer apply (values(PatIndex('%[1-3][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%',string)))d1(v)
outer apply (values(PatIndex('%[0-9].[0-9][0-9].[0-9][0-9]%',string)))d2(v)
outer apply (values(PatIndex('%[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9][0-9]%',string)))d3(v);

See demo Fiddle

  • Related