Home > Software design >  Oracle: instr substr instead of regexp_substr
Oracle: instr substr instead of regexp_substr

Time:11-05

I got this query from another post I made which uses REGEXP_SUBSTR() to pull out specific information from a string in oracle. It works good but only for small sets of data. When it comes to tables that have 300,000 records, it is very slow and I was doing some reading that instr substr might be faster. The example query is:

SELECT REGEXP_SUBSTR(value, '(^|\|)\s*24=\s*(.*?)\s*(\||$)',  1, 1, NULL, 2)  AS "24",
       REGEXP_SUBSTR(value, '(^|\|)\s*35=\s*(.*?)\s*(\||$)',  1, 1, NULL, 2)  AS "35",
       REGEXP_SUBSTR(value, '(^|\|)\s*47A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "47A",
       REGEXP_SUBSTR(value, '(^|\|)\s*98A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "98A"
FROM   table_name

Table example:

CREATE TABLE table_name (value ) AS
SELECT '35= 88234.00 | 47A= Shawn | 98A= This is a comment |' FROM DUAL UNION ALL
SELECT '24= 123.00 | 98A= This is a comment | 47A= Derick |' FROM DUAL

Output of query would be:

24 35 47A 98A
88234.00 Shawn This is a comment
123.00 Derick This is a comment

Can someone give me an example of how this same query would look if I was doing instr substr instead?

Thank you.

CodePudding user response:

SELECT CASE 
       WHEN start_24 > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_24   5,
                INSTR(value, '|', start_24   5) - (start_24 5)
              )
           )
       END AS "24",
       CASE 
       WHEN start_35 > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_35   5,
                INSTR(value, '|', start_35   5) - (start_35 5)
              )
           )
       END AS "35",
       CASE 
       WHEN start_47a > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_47a   6,
                INSTR(value, '|', start_47a   6) - (start_47a 6)
              )
           )
       END AS "47A",
       CASE 
       WHEN start_98a > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_98a   6,
                INSTR(value, '|', start_98a   6) - (start_98a 6)
              )
           )
       END AS "98A"
FROM   (
  SELECT value,
         INSTR(value, '| 24=') AS start_24,
         INSTR(value, '| 35=') AS start_35,
         INSTR(value, '| 47A=') AS start_47a,
         INSTR(value, '| 98A=') AS start_98a
  FROM   (
    SELECT '| ' || value AS value FROM table_name
  )
);

Which, for your sample data, outputs:

24 35 47A 98A
88234.00 Shawn This is a comment
123.00 Derick This is a comment

db<>fiddle here

CodePudding user response:

Given the data in your example it seems you could also use a procedural approach for your data extraction, but I'm sceptical if this could be faster.

The following function get24 for example extracts the columns "24" just using INSTR and SUBSTR.

CREATE OR REPLACE FUNCTION get24(value IN VARCHAR2) RETURN VARCHAR2
IS
    i PLS_INTEGER;
    s VARCHAR2(32767);
BEGIN
  i := INSTR(value, '24= ');
  IF (i <> 1) THEN
    RETURN NULL;
  END IF;
  s := SUBSTR(value, i   4);
  i := INSTR(s, ' | ');
  IF (i = 0) THEN
    RETURN NULL;
  END IF;
  RETURN SUBSTR(s, 1, i - 1);
END;
/

SELECT get24(value) "24" FROM table_name;

You could then also try using a pipelined function and do all the data extraction within the pipelined function.

  • Related