I have a column containing filenames where I am trying to isolate the vaccine portion of the text. I need to cut off the beginning of the string up to the underscore. Then I need to cut of the file extension including the period, which could be four or five characters. I have the first part working, but it is not doing the second part. What am I missing? Should I use regexp instead of my 2nd instr? If so, how?
Here are the original values.
4212406_Meningitis.jpg
4824729_Hep-B.jpg
3612290_Hep-B.jpg
2811504_Covid-19.jpeg
621980_Covid-19.pdf
5258652_MMR.jpeg
5755663_Meningitis.png
2555841_Covid-19.PNG
2677160_MMR.jpg
2294961_MMR.jpg
SELECT original_field,
SUBSTR(original_field, INSTR(original_field, '_') 1, INSTR(original_field, '.') -1) AS current_field
FROM my_table
CodePudding user response:
SUBSTR
has the signature SUBSTR(value, start_position, substring_length)
(rather than having the third argument as end position). You can use:
SELECT original_field,
SUBSTR(original_field, start_pos, end_pos - start_pos) AS current_field
FROM (
SELECT original_field,
INSTR(original_field, '_') 1 AS start_pos,
INSTR(original_field, '.', -1) AS end_pos
FROM my_table
);
or:
SELECT original_field,
SUBSTR(
original_field,
INSTR(original_field, '_') 1,
INSTR(original_field, '.', -1) - INSTR(original_field, '_') - 1
) AS current_field
FROM my_table;
or, shorter to type but slower to execute:
SELECT original_field,
REGEXP_SUBSTR(original_field, '_(.*)\.', 1, 1, NULL, 1) AS current_field
FROM my_table;
Which, for the sample data:
CREATE TABLE my_table ( original_field ) AS
SELECT '4212406_Meningitis.jpg' FROM DUAL UNION ALL
SELECT '12345_Small_Pox.version_2.jpg' FROM DUAL;
All output:
ORIGINAL_FIELD | CURRENT_FIELD |
---|---|
4212406_Meningitis.jpg | Meningitis |
12345_Small_Pox.version_2.jpg | Small_Pox.version_2 |
Or, if you want the first .
character instead of the last then:
SELECT original_field,
SUBSTR(original_field, start_pos, end_pos - start_pos) AS current_field
FROM (
SELECT original_field,
INSTR(original_field, '_') 1 AS start_pos,
INSTR(original_field, '.') AS end_pos
FROM my_table
);
or:
SELECT original_field,
SUBSTR(
original_field,
INSTR(original_field, '_') 1,
INSTR(original_field, '.') - INSTR(original_field, '_') - 1
) AS current_field
FROM my_table;
or:
SELECT original_field,
REGEXP_SUBSTR(original_field, '_(.*?)\.', 1, 1, NULL, 1) AS current_field
FROM my_table;
Which all 3 output:
ORIGINAL_FIELD | CURRENT_FIELD |
---|---|
4212406_Meningitis.jpg | Meningitis |
12345_Small_Pox.version_2.jpg | Small_Pox |