Home > OS >  How do I dynamically extract substring from string?
How do I dynamically extract substring from string?

Time:03-25

I’m trying to dynamically extract a substring from a very long URL. For example, I may have the following URLs:

  1. https://www.google.com/ABCDEF Version=“0.0.00.0” GHIJK
  2. https://www.google.com/ABCDEFGH Version=“0.0.0.0” IJKLM
  3. https://www.google.com/ABC Version=“0.0.0.00” 12345

I am trying to extract the version code only (0.0.0.0).

This is what I have so far:

SELECT SUBSTR(col, INSTR(col, ‘Version=“‘) 9) 
FROM table

This query returns the following result:

0.0.00.0” GHIJK … (url continues on)

So, I attempt to find “Version” in the link, so I can start from the same position in each row. This works fine, however I’m having a hard time dynamically locating the ending quote (“). I tried using INSTR in the third parameter of my SUBSTR function, like so:

SELECT SUBSTR(col, INSTR(col, ‘Version=“‘) 9, INSTR(col, ‘“‘)) 
FROM table

I figured that this would find the position of the ending quote, and then use that number for the length, but it returns a strange output. I’ve also used POSITION, CHARINDEX, LENGTH, and LOCATE. None of these functions work in Oracle.

I think maybe when I put 9 after the first INSTR function, it’s setting the query to a fixed position instead of a dynamic one, but I’m not sure how else to remove ‘Version=“‘.

CodePudding user response:

Here's one option (which, actually, selects what's between double quotes - that's version in your example; if there were some other similar substring, you'd get a wrong result).

with test (col) as
  (select 'https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK'  from dual union all
   select 'https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM' from dual union all
   select 'https://www.google.com/ABC Version="0.0.0.00" 12345'     from dual
  )
select col,
  replace(regexp_substr(col, '". "'), '"') version
from test;

which results in

https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK      0.0.00.0
https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM     0.0.0.0
https://www.google.com/ABC Version="0.0.0.00" 12345         0.0.0.00

CodePudding user response:

You can still use use INSTR to locate the second " in the string, then subtract the location of the first " to get the length that you need to get. Below is an example query:

SELECT col,
       SUBSTR (col, INSTR (col, '"')   1, INSTR (col, '"', 1, 2) - INSTR (col, '"') - 1)    version
  FROM test;

CodePudding user response:

You can use REGEXP_SUBSTR() with Version=(\d.*\d?) pattern in order to extract the piece between Version=" and "(your quotes are presumed to be regular double quotes " ")

SELECT REGEXP_SUBSTR(url,'Version="(\d.*\d)"',1,1,null,1) AS version        
  FROM t

where

  • the third argument(1) is position, the fourth argument(1) is occurence, and especially important to use the last one as being capture group (1)
  • indeed using '"(\d.*\d)"' pattern is enough for the current data set

or

REGEXP_REPLACE() with capture group \2 as

SELECT REGEXP_REPLACE(url,'^(.*Version=")([^"]*).*','\2') AS version        
  FROM t

Demo

  • Related