I'm trying to get the number that comes right before the "years ago" portion of a string.
select substring('i wanted this 4 years ago so yah',0,patindex('% years ago%','i wanted this 4 years ago so yah')).
The problem is I'm assuming the starting position of the number is at 0, which is clearly not the case, so its bringing everything before that. Im having trouble trying to figure out what I should put as the starting position because the number can be any length and there could be multiple numbers in the string like below:
String Example: "I got this for 40 dollars about 2 years ago at the mall"
In this example, I still just want the 2.
Using SQL Server 11.0.7507
CodePudding user response:
Please try the following solution.
By using XML and XQuery, it is very easy to achieve the objective.
Notable points:
CROSS APPLY
is converting input tokens column into XML. It is done for tokenization of each word in a sentence.- XQuery
.query()
method is using FLWOR expression to traverse all tokens, along the way it is checking for an INT data type, getting its position, and checking that very next two tokens are "years" and "ago".
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens varchar(MAX));
INSERT INTO @tbl (tokens) VALUES
('I wanted this 4 years ago so yah'),
('I got this for 40 dollars about 2 years ago at the mall'),
('This happens when key words are missing');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.*
, NULLIF(c.query('
for $x in /root/r[xs:int(./text()[1]) instance of xs:int]/text()
let $pos := count(/root/r[. << $x[1]])
return if (/root/r[$pos 1]/text() = "years"
and /root/r[$pos 2]/text() = "ago" ) then $x
else ()
').value('.','INT'), 0) AS Result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['
REPLACE(tokens, @separator, ']]></r><r><![CDATA[')
']]></r></root>' AS XML)) AS t1(c);
Output
---- --------------------------------------------------------- --------
| id | tokens | Result |
---- --------------------------------------------------------- --------
| 1 | I wanted this 4 years ago so yah | 4 |
| 2 | I got this for 40 dollars about 2 years ago at the mall | 2 |
| 3 | This happens when key words are missing | NULL |
---- --------------------------------------------------------- --------
CodePudding user response:
I agree with the comments - you've now asked three T-SQL string parsing questions in quick succession (and not accepted any answers). Other languages are better suited to this task.
But just for fun, the following will work if the number you want immediately precedes the phrase "years ago", and it is a whole number (no decimals), and the sentence doesn't begin with the number (this condition can be eliminated if necessary), and that the relevant occurrence of the phrase "years ago" is the first one, if multiple exist.
declare @str varchar(200) = 'I got this for 40 dollars about 2 years ago at the mall';
with positions as
(
select string, yearsago, numLength
from (values (@str)) t (string)
cross apply (values (charindex(' years ago', string))) p1 (yearsago)
cross apply (values (patindex('%[^0-9]%', reverse(left(string, yearsago - 1))) - 1)) p2 (numLength)
)
select substring(string, yearsago - numLength, numLength)
from positions;
CodePudding user response:
Trying to do this with SQL gets out of control very quickly as you have very limited string functions to work with and no RegEx, so if you have an option of moving this out of the database then take it:
Having said all that if you have to do it in the database then you need to start working with the limited pattern patching below is an example to get you started, it will pick out
'[0-9] years ago'
and
'[0-9][0-9] years ago'
but will not pick out
'[0-9] years ago'
You will have to keep nesting the patters you looking for and pity they person who has to maintain this, hopefully this is enough to convince you to move the logic of of the database.
Create table #TheTable (data varchar(256))
Insert into #TheTable values ('i wanted this 4 years ago so yah');
Insert into #TheTable values ('i wanted this 5 years ago so yah');
Insert into #TheTable values ('i wanted this 12 years ago so yah');
Insert into #TheTable values ('no data ');
Insert into #TheTable values (null);
Insert into #TheTable values ('I got this for 40 dollars about 2 years ago at the mall');
Insert into #TheTable values ('I got this for 40 dollars about 99 years ago at the mall');
Declare @Pattern1 varchar(50) = '%[0-9] years ago%'
Declare @Pattern2 varchar(50) = '%[0-9][0-9] years ago%'
Declare @Pattern3 varchar(50) = '%[0-9] years ago%'
Declare @Pattern4 varchar(50) = '%[0-9][0-9] years ago%'
Select IIF(PATINDEX(@Pattern2, data) = 0,
(
IIF(PATINDEX(@Pattern1, data) = 0,
(
IIF(PATINDEX(@Pattern3, data) = 0,
(
IIF(PATINDEX(@Pattern4, data) = 0,
'-1',
SUBSTRING(data, PATINDEX(@Pattern4, data), 2))
)
,
SUBSTRING(data, PATINDEX(@Pattern3, data), 1))
),
SUBSTRING(data, PATINDEX(@Pattern1, data), 1))
),
SUBSTRING(data, PATINDEX(@Pattern2, data), 2)) as Years, data
data
from #TheTable
drop table #TheTable
CodePudding user response:
Pattern matching only works when you have the exact phrasing
but you can do following, which will find double and single digit years
WITH CTE AS (SELECT id,YourValue ,CASE WHEN PATINDEX('%[0-9][0-9] year%', YourValue) > 0 THEN PATINDEX('%[0-9][0-9] year%', YourValue) WHEN PATINDEX('%[0-9] year%', YourValue) > 0 THEN PATINDEX('%[0-9] year%', YourValue) ELSe NULL END As [val1], CASE WHEN PATINDEX('%[0-9] year%', YourValue) > 0 THEN PATINDEX('% year%', YourValue) ELSE NULL ENd as [val2] FROM t1) SELECT id, SUBSTRING(YourValue,val1,val2 -val1) as [years] FROM CTE WHERE val1 IS NOT NULL
id | years -: | :---- 1 | 4 2 | 2
db<>fiddle here