Max of two years in string-CodePudding

If I have a string like this:

"I started in 1998 and finished around 2012"

I want to extract the "year" part from the string (the string in between the years can be anything and in any length), and then take the most recent year (so max of the years), I was trying to maybe identify the year with this:

SUBSTRING(YourValue, NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue), 0), 4)

but I'm not sure how to get multiple years and then compare them.

Further, the most recent year doesn't have to be at the end of the string:

"I was done in 2013 but started in 1997."

I'm using SQL Server 2012 (v11.0.7507)

CodePudding user response：

Please try the following solution.

It is using XML and XQuery.

It will work starting from SQL Server 2012 onwards.

Notable points:

CROSS APPLY is converting input tokens column into XML. It is done for tokenization of each word in a sentence.
XQuery .query() method is using FLWOR expression to traverse all tokens, along the way it is checking for a YEAR data type (xs:gYear in XML parlance), and picking up a max year.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens varchar(MAX));
INSERT INTO @tbl (tokens) VALUES
('I started in 1998 and finished around 2012'),
('I was done in 2013 but started in 1997.'),
('I started in 2022.');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1)
    , @dot CHAR(1) = '.';

SELECT t.*
   , c.query('
      for $x in max(/root/r[xs:gYear(./text()[1]) instance of xs:gYear]/text())
      return $x
   ').value('.','CHAR(4)') AS Result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
      REPLACE(REPLACE(tokens,@dot, ''), @separator, ']]></r><r><![CDATA[')   
      ']]></r></root>' AS XML)) AS t1(c);

Output

 ---- -------------------------------------------- -------- 
| id |                   tokens                   | Result |
 ---- -------------------------------------------- -------- 
|  1 | I started in 1998 and finished around 2012 |   2012 |
|  2 | I was done in 2013 but started in 1997.    |   2013 |
|  3 | I started in 2022.                         |   2022 |
 ---- -------------------------------------------- --------

CodePudding user response：

DECLARE @sometext VARCHAR(max) = 'I started in 1998 and finished around 2012';

select @sometext;

select TOP 1 * from string_split(@sometext,' ') where value between '1000' and '9999' order by value desc;

DBFIDDLE

or, maybe:

select TOP 1 *  
from string_split(replace(replace(@sometext,'.',' '),',',' '),' ')
where value between '1000' and '9999' 
order by value desc;

when your sentence ends with a '.'.

CodePudding user response：

It is not pretty, but as you can not tell how may dates there are, this is bound to be.

The CTE searches for dates, else returns NULL, that is in case no or only 1 date is there.

the following select check the lowest date as it must be the start date and also captures if one or both values of the CTE was NULL

CREATE TABLE t1 (id int, YourValue varchar(299))

INSERT INTO t1 VALUEs(1,'I started in 1998 and finished around 2012'),(2,'I was done in 2013 but started in 1997.')
,(2,'I started in 2022.')

WITH CTE AS 
(SELECT CASE WHEN PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue) > 0 THEN
SUBSTRING(YourValue, NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue), 0), 4) ELSE NULL END
val1 ,
CASE WHEN PATINDEX('%[0-9][0-9][0-9][0-9]%', RIGHT(YourValue, LEn(YourValue) - PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue) - 4)) > 0 
THEN SUBSTRING(RIGHT(YourValue, LEn(YourValue) - PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue) - 4) ,
PATINDEX('%[0-9][0-9][0-9][0-9]%', RIGHT(YourValue, LEn(YourValue) - PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue) - 4)),4) 
ELSE NULL END as val2
FROM t1)
SELECT CASE WHEN val2 IS NULL THEn val1 
ELSE
CASE WHeN val1 <= Val2 THEN val1 ELSE val2 END END as [STartdate]
,CASE WHEN val2 IS NULL THEn NULL
ELSE CASe WHEN val1 > Val2 THEN val1 ELSE val2 END END as [ENDdate] FROM CTE

STartdate | ENDdate
:-------- | :------
1998      | 2012   
1997      | 2013   
2022      | null

db<>fiddle here