Home > Software engineering >  TSQL Function to extract dates
TSQL Function to extract dates

Time:03-01

I have created a TSQL function that is able to extract dates from strings.

The function works when the date has no "."

Please help get the missing piece to make this function work with dates that have periods.

Dates with Periods come in 2 different flavors: MM.DD.YYYY and M.DD.YY

IF OBJECT_ID(N'tempdb..#fileNameTable', N'U') IS NOT NULL DROP TABLE #fileNameTable

CREATE TABLE #fileNameTable(
    [fName] [VARCHAR](250) NULL
                                                                                                            
)

INSERT INTO #fileNameTable (fName) SELECT '9999999991_Agent Name_08.02.2018.WAV'
INSERT INTO #fileNameTable (fName) SELECT '9999999999 - Internal ID 1446683 (Pedro) 6.26.17 WB.mp3'
INSERT INTO #fileNameTable (fName) SELECT '9999999998 - Internal ID 1464807 (John) 7.11.17.mp3'
INSERT INTO #fileNameTable (fName) SELECT '9999999997 - Internal ID 1447503 (Marta) 6.27.17.mp3'
INSERT INTO #fileNameTable (fName) SELECT '9999999996 - Internal ID 1437403 (Ruby) 6.20.17.mp3'
INSERT INTO #fileNameTable (fName) SELECT 'rc_20200817_1612_9999999995_NJ.wav'
INSERT INTO #fileNameTable (fName) SELECT 'rc_20200817_1543_9999999994_PA.wav'
INSERT INTO #fileNameTable (fName) SELECT 'rc_20200817_1211_9999999993_MA.wav'
INSERT INTO #fileNameTable (fName) SELECT 'rc_20200817_1211_9999999992_MD.wav'


SELECT *, [dbo].[ExtractDateFromFileName](fName) FROM #fileNameTable


CREATE FUNCTION [dbo].[ExtractDateFromFileName]
(
  @str VARCHAR(256)
)
RETURNS VARCHAR(256)
AS
BEGIN

DECLARE @validchars VARCHAR(MAX)
    SET @validchars = '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    DECLARE @idx INT
    SET @idx = PATINDEX('%'  @validchars  '%',@str)
    IF @idx > 0 AND 
        (@idx = LEN(@str)-8
        OR PATINDEX(SUBSTRING(@str,@idx 9,1),'[0-9]')=0)
        SET @str=SUBSTRING(@str ,PATINDEX('%'  @validchars  '%',@str), 8)
    ELSE SET @str = ''
    RETURN @str
END
GO

CodePudding user response:

I would suggest you implement this as a table-valued function, I had a go at rewriting your function to work for both types of dates in your sample data. It may still require some tweaking for edge-cases but should be mostly there.

Testing with 1.3 million rows (of your sample data, duplicated with a cartesian join) on my desktop PC, your current scalar-valued function has the following statistics time data:

SQL Server Execution Times: CPU time = 19886 ms, elapsed time = 19961 ms.

With the function below the same 1.3 million rows has the following times:

SQL Server Execution Times: CPU time = 7122 ms, elapsed time = 7296 ms.

The table-valued function is much faster because SQL Server is able to inline it into the query, rather than running a separate function stand-alone per row, ie, RBAR.

create or alter function [dbo].[ExtractDateFromFileName2](@str varchar(256))
returns table
as return
select coalesce(
    Try_Convert(date,Substring(string, d1.v,8),111), 
    Try_Convert(date,Substring(string, d3.v,10),104),
    Try_Convert(date,Substring(string, d2.v,7),1)
    ) [Value]
from (select @str string)s
outer apply (values(PatIndex('%[1-3][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%',string)))d1(v)
outer apply (values(PatIndex('%[0-9].[0-9][0-9].[0-9][0-9]%',string)))d2(v)
outer apply (values(PatIndex('%[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9][0-9]%',string)))d3(v);

See demo Fiddle

CodePudding user response:

Another approach:

CREATE FUNCTION dbo.ExtractDateFromGarbage ( @garbage varchar(250))
RETURNS TABLE WITH SCHEMABINDING
AS
  RETURN 
  (
    SELECT theDate = TRY_CONVERT(date, 
        x   REPLACE(LEFT(y, PATINDEX('%[^0-9]%', y '.')-1),'.','/'))
      FROM (SELECT g = @garbage) AS f 
    CROSS APPLY (VALUES (PATINDEX('%[01 ][0-9].%[0-9].[0-9][0-9]%', g),
       PATINDEX('%2[0-9][0-9][0-9][0-1][0-9][0-3][0-9]%', g))) AS v(s,t) 
    CROSS APPLY (VALUES (SUBSTRING(g, COALESCE(NULLIF(s,0),t), 250))) AS q(q) 
    CROSS APPLY (VALUES(LEFT(q,6), SUBSTRING(q,7,250))) AS z(x,y)
  );

It handles one edge case Stu's doesn't (mm.d.yy):

The moral of the story is still that you should be doing this somewhere else (and storing proper dates in the right data type once the data makes its way to SQL Server).

  • Related