Home > Software engineering >  Oracle substring function of variable length
Oracle substring function of variable length

Time:11-15

I have a column containing filenames where I am trying to isolate the vaccine portion of the text. I need to cut off the beginning of the string up to the underscore. Then I need to cut of the file extension including the period, which could be four or five characters. I have the first part working, but it is not doing the second part. What am I missing? Should I use regexp instead of my 2nd instr? If so, how?

Here are the original values.

4212406_Meningitis.jpg
4824729_Hep-B.jpg
3612290_Hep-B.jpg
2811504_Covid-19.jpeg
621980_Covid-19.pdf
5258652_MMR.jpeg
5755663_Meningitis.png
2555841_Covid-19.PNG
2677160_MMR.jpg
2294961_MMR.jpg
SELECT original_field, 
       SUBSTR(original_field, INSTR(original_field, '_')   1, INSTR(original_field, '.') -1) AS current_field
FROM my_table 

enter image description here

CodePudding user response:

SUBSTR has the signature SUBSTR(value, start_position, substring_length) (rather than having the third argument as end position). You can use:

SELECT original_field, 
       SUBSTR(original_field, start_pos, end_pos - start_pos) AS current_field
FROM   (
  SELECT original_field,
         INSTR(original_field, '_')   1 AS start_pos,
         INSTR(original_field, '.', -1) AS end_pos
  FROM   my_table
);

or:

SELECT original_field, 
       SUBSTR(
         original_field,
         INSTR(original_field, '_')   1,
         INSTR(original_field, '.', -1) - INSTR(original_field, '_') - 1
       ) AS current_field
FROM   my_table;

or, shorter to type but slower to execute:

SELECT original_field, 
       REGEXP_SUBSTR(original_field, '_(.*)\.', 1, 1, NULL, 1) AS current_field
FROM   my_table;

Which, for the sample data:

CREATE TABLE my_table ( original_field ) AS
SELECT '4212406_Meningitis.jpg' FROM DUAL UNION ALL
SELECT '12345_Small_Pox.version_2.jpg' FROM DUAL;

All output:

ORIGINAL_FIELD CURRENT_FIELD
4212406_Meningitis.jpg Meningitis
12345_Small_Pox.version_2.jpg Small_Pox.version_2

Or, if you want the first . character instead of the last then:

SELECT original_field, 
       SUBSTR(original_field, start_pos, end_pos - start_pos) AS current_field
FROM   (
  SELECT original_field,
         INSTR(original_field, '_')   1 AS start_pos,
         INSTR(original_field, '.') AS end_pos
  FROM   my_table
);

or:

SELECT original_field, 
       SUBSTR(
         original_field,
         INSTR(original_field, '_')   1,
         INSTR(original_field, '.') - INSTR(original_field, '_') - 1
       ) AS current_field
FROM   my_table;

or:

SELECT original_field, 
       REGEXP_SUBSTR(original_field, '_(.*?)\.', 1, 1, NULL, 1) AS current_field
FROM   my_table;

Which all 3 output:

ORIGINAL_FIELD CURRENT_FIELD
4212406_Meningitis.jpg Meningitis
12345_Small_Pox.version_2.jpg Small_Pox

fiddle

  • Related