Home > OS >  Select specific portion of string using Regex match
Select specific portion of string using Regex match

Time:10-11

Please consider the below table. I am trying to retrieve only the EUR amount within the Tax strings. Some records vary more than the other in size, but the float numbers are always there.

OrderID    SKU      Price    Tax
****       ****     ****     [<TV<standard#21.0#false#21.36#EUR>VT>]
****       ****     ****     [<TV<standard#21.0#false#7.21#EUR>VT>]
****       ****     ****     [<TV<standard#17.0#false#5.17#EUR>VT>]

I wrote a regular expression that matches what I need: \d \W\d returns me both float values within the string. In Oracle SQL I can simply get the second occurrence with a query like:

SELECT REGEXP_SUBSTR(column, '\d \W\d ',1,2) FROM table

Using the above approach I retrieve 21.36, 7.21 and 5.17 for those three records.

How can I achieve this with SQL Server?

CodePudding user response:

Obviously regex would be the likely tool of choice here. But SQL Server does not have much native regex support. Here is a pure SQL Server solution making use of PATINDEX and CHARINDEX. It is a bit verbose, but gets the job done:

SELECT
    SUBSTRING(Tax,
              CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax)   3)   1,
              CHARINDEX('#', Tax, CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax)   3)   1) -
              CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax)   3) - 1)
FROM yourTable;

screen capture from demo link below

Demo

CodePudding user response:

Please try the following solution.

The approach is using XML for tokenization of the tax column. It is producing an XML like below for each row:

<root>
  <r>[&lt;TV&lt;standard</r>
  <r>21.0</r>
  <r>false</r>
  <r>21.36</r>
  <r>EUR&gt;VT&gt;]</r>
</root>

4th r element is a monetary value in question.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tax VARCHAR(MAX));
INSERT INTO @tbl (Tax) VALUES
('[<TV<standard#21.0#false#21.36#EUR>VT>]'),
('[<TV<standard#21.0#false#7.21#EUR>VT>]'),
('[<TV<standard#17.0#false#5.17#EUR>VT>]');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = '#';

SELECT t.*
    , c.value('(/root/r[4]/text())[1]', 'DECIMAL(10,2)') AS result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
        REPLACE(tax, @separator, ']]></r><r><![CDATA[')   
        ']]></r></root>' AS XML)) AS t1(c);

Output

 ---- ----------------------------------------- -------- 
| ID |                   Tax                   | result |
 ---- ----------------------------------------- -------- 
|  1 | [<TV<standard#21.0#false#21.36#EUR>VT>] |  21.36 |
|  2 | [<TV<standard#21.0#false#7.21#EUR>VT>]  |   7.21 |
|  3 | [<TV<standard#17.0#false#5.17#EUR>VT>]  |   5.17 |
 ---- ----------------------------------------- -------- 
  • Related