I got this query from another post I made which uses REGEXP_SUBSTR()
to pull out specific information from a string in oracle. It works good but only for small sets of data. When it comes to tables that have 300,000 records, it is very slow and I was doing some reading that instr substr might be faster. The example query is:
SELECT REGEXP_SUBSTR(value, '(^|\|)\s*24=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "24",
REGEXP_SUBSTR(value, '(^|\|)\s*35=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "35",
REGEXP_SUBSTR(value, '(^|\|)\s*47A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "47A",
REGEXP_SUBSTR(value, '(^|\|)\s*98A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "98A"
FROM table_name
Table example:
CREATE TABLE table_name (value ) AS
SELECT '35= 88234.00 | 47A= Shawn | 98A= This is a comment |' FROM DUAL UNION ALL
SELECT '24= 123.00 | 98A= This is a comment | 47A= Derick |' FROM DUAL
Output of query would be:
24 | 35 | 47A | 98A |
---|---|---|---|
88234.00 | Shawn | This is a comment | |
123.00 | Derick | This is a comment |
Can someone give me an example of how this same query would look if I was doing instr substr instead?
Thank you.
CodePudding user response:
SELECT CASE
WHEN start_24 > 0
THEN TRIM(
SUBSTR(
value,
start_24 5,
INSTR(value, '|', start_24 5) - (start_24 5)
)
)
END AS "24",
CASE
WHEN start_35 > 0
THEN TRIM(
SUBSTR(
value,
start_35 5,
INSTR(value, '|', start_35 5) - (start_35 5)
)
)
END AS "35",
CASE
WHEN start_47a > 0
THEN TRIM(
SUBSTR(
value,
start_47a 6,
INSTR(value, '|', start_47a 6) - (start_47a 6)
)
)
END AS "47A",
CASE
WHEN start_98a > 0
THEN TRIM(
SUBSTR(
value,
start_98a 6,
INSTR(value, '|', start_98a 6) - (start_98a 6)
)
)
END AS "98A"
FROM (
SELECT value,
INSTR(value, '| 24=') AS start_24,
INSTR(value, '| 35=') AS start_35,
INSTR(value, '| 47A=') AS start_47a,
INSTR(value, '| 98A=') AS start_98a
FROM (
SELECT '| ' || value AS value FROM table_name
)
);
Which, for your sample data, outputs:
24 35 47A 98A 88234.00 Shawn This is a comment 123.00 Derick This is a comment
db<>fiddle here
CodePudding user response:
Given the data in your example it seems you could also use a procedural approach for your data extraction, but I'm sceptical if this could be faster.
The following function get24
for example extracts the columns "24" just using INSTR
and SUBSTR
.
CREATE OR REPLACE FUNCTION get24(value IN VARCHAR2) RETURN VARCHAR2
IS
i PLS_INTEGER;
s VARCHAR2(32767);
BEGIN
i := INSTR(value, '24= ');
IF (i <> 1) THEN
RETURN NULL;
END IF;
s := SUBSTR(value, i 4);
i := INSTR(s, ' | ');
IF (i = 0) THEN
RETURN NULL;
END IF;
RETURN SUBSTR(s, 1, i - 1);
END;
/
SELECT get24(value) "24" FROM table_name;
You could then also try using a pipelined function and do all the data extraction within the pipelined function.