Home > Software engineering >  How to select number before a string pattern SQL
How to select number before a string pattern SQL

Time:08-01

I'm trying to get the number that comes right before the "years ago" portion of a string.

select substring('i wanted this 4 years ago so yah',0,patindex('% years ago%','i wanted this 4 years ago so yah')).

The problem is I'm assuming the starting position of the number is at 0, which is clearly not the case, so its bringing everything before that. Im having trouble trying to figure out what I should put as the starting position because the number can be any length and there could be multiple numbers in the string like below:

String Example: "I got this for 40 dollars about 2 years ago at the mall"

In this example, I still just want the 2.

Using SQL Server 11.0.7507

CodePudding user response:

I agree with the comments - you've now asked three T-SQL string parsing questions in quick succession (and not accepted any answers). Other languages are better suited to this task.

But just for fun, the following will work if the number you want immediately precedes the phrase "years ago", and it is a whole number (no decimals), and the sentence doesn't begin with the number (this condition can be eliminated if necessary), and that the relevant occurrence of the phrase "years ago" is the first one, if multiple exist.

declare @str varchar(200) = 'I got this for 40 dollars about 2 years ago at the mall';

with positions as
(
      select        string, yearsago, numLength
      from          (values (@str)) t (string)
      cross apply   (values (charindex(' years ago', string))) p1 (yearsago)
      cross apply   (values (patindex('%[^0-9]%', reverse(left(string, yearsago - 1))) - 1)) p2 (numLength)
)
select substring(string, yearsago - numLength, numLength)
from   positions;

CodePudding user response:

Trying to do this with SQL gets out of control very quickly as you have very limited string functions to work with and no RegEx, so if you have an option of moving this out of the database then take it:

Having said all that if you have to do it in the database then you need to start working with the limited pattern patching below is an example to get you started, it will pick out

'[0-9] years ago'

and

'[0-9][0-9] years ago' 

but will not pick out

'[0-9]   years ago'

You will have to keep nesting the patters you looking for and pity they person who has to maintain this, hopefully this is enough to convince you to move the logic of of the database.

Create table #TheTable (data varchar(256))


Insert into #TheTable values ('i wanted this 4 years ago so yah');
Insert into #TheTable values ('i wanted this 5  years ago so yah');
Insert into #TheTable values ('i wanted this 12 years ago so yah');
Insert into #TheTable values ('no data ');
Insert into #TheTable values (null);
Insert into #TheTable values ('I got this for 40 dollars about 2 years ago at the mall');
Insert into #TheTable values ('I got this for 40 dollars about 99 years ago at the mall');

Declare @Pattern1 varchar(50) = '%[0-9] years ago%'
Declare @Pattern2 varchar(50) = '%[0-9][0-9] years ago%'
Declare @Pattern3 varchar(50) = '%[0-9]  years ago%'
Declare @Pattern4 varchar(50) = '%[0-9][0-9]  years ago%'

Select IIF(PATINDEX(@Pattern2, data) = 0,
        
        (
            IIF(PATINDEX(@Pattern1, data) = 0,
            (
                IIF(PATINDEX(@Pattern3, data) = 0,
                (
                    IIF(PATINDEX(@Pattern4, data) = 0,
                    '-1',
                    SUBSTRING(data, PATINDEX(@Pattern4, data), 2))
                )
                ,
                SUBSTRING(data, PATINDEX(@Pattern3, data), 1))
            ),
            SUBSTRING(data, PATINDEX(@Pattern1, data), 1))
        ),
        
        SUBSTRING(data, PATINDEX(@Pattern2, data), 2)) as Years, data
data 
from #TheTable

drop table #TheTable

CodePudding user response:

Pattern matching only works when you have the exact phrasing

but you can do following, which will find double and single digit years

WITH CTE AS 
(SELECT id,YourValue ,CASE WHEN PATINDEX('%[0-9][0-9] year%', YourValue) > 0 THEN
PATINDEX('%[0-9][0-9] year%', YourValue) 
WHEN PATINDEX('%[0-9] year%', YourValue) > 0 THEN
PATINDEX('%[0-9] year%', YourValue) ELSe NULL END As [val1],
CASE WHEN PATINDEX('%[0-9] year%', YourValue) > 0 THEN PATINDEX('% year%', YourValue) ELSE NULL ENd as [val2]
FROM t1)
SELECT id, SUBSTRING(YourValue,val1,val2 -val1) as [years] FROM CTE 
WHERE val1 IS NOT NULL
id | years
-: | :----
 1 | 4    
 2 | 2    

db<>fiddle here

CodePudding user response:

Please try the following solution.

By using XML and XQuery, it is very easy to achieve the objectives.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens varchar(MAX));
INSERT INTO @tbl (tokens) VALUES
('I wanted this 4 years ago so yah'),
('I got this for 40 dollars about 2 years ago at the mall'),
('This happens when key words are missing');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1);

SELECT t.*
   , NULLIF(c.query('
      for $x in /root/r[xs:int(./text()[1]) instance of xs:int]/text()
      let $pos := count(/root/r[. << $x[1]])
      return if (/root/r[$pos   1]/text() = "years" 
                and /root/r[$pos   2]/text() = "ago" ) then $x
            else ()
   ').value('.','INT'), 0) AS Result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
      REPLACE(tokens, @separator, ']]></r><r><![CDATA[')   
      ']]></r></root>' AS XML)) AS t1(c);

Output

 ---- --------------------------------------------------------- -------- 
| id |                         tokens                          | Result |
 ---- --------------------------------------------------------- -------- 
|  1 | I wanted this 4 years ago so yah                        | 4      |
|  2 | I got this for 40 dollars about 2 years ago at the mall | 2      |
|  3 | This happens when key words are missing                 | NULL   |
 ---- --------------------------------------------------------- -------- 
  • Related