Home > front end >  Regexp substr in oracle
Regexp substr in oracle

Time:02-21

Concerta 18 mg/24 hr oral tablet, extended release

Expected Output

tablet, extended release

Input

glipiZIDE 10 mg oral tablet, extended release

Expected Output

tablet, extended release

Input

Adderall XR 10 mg oral capsule, extended release

Expected Output

capsule, extended release

I used the below query and getting tables or capsule as output.

select trim(
         regexp_substr(
           lower('Adderall XR 10 mg oral capsule, extended release' ),
           ' ((caps|tab|powd|syr|aero|liq|susp)[a-z] |solution|lotion|spray|([^a-z0-9])?extended release ) '
         )
       )
from dual;

CodePudding user response:

With as many info you posted (read: no explanation at all), why would you need regular expressions? Simple substr instr returns desired output:

SQL> with test (id, col) as
  2    (select 1, 'Concerta 18 mg/24 hr oral tablet, extended release' from dual union all
  3     select 2, 'glipiZIDE 10 mg inhalation aerosol'                 from dual union all
  4     select 3, 'Adderall XR 10 mg subcutaneous solution'            from dual
  5    )
  6  select id,
  7    substr(col, case when instr(col, 'oral')         > 0 then instr(col, 'oral')   5
  8                     when instr(col, 'inhalation')   > 0 then instr(col, 'inhalation')   11
  9                     when instr(col, 'subcutaneous') > 0 then instr(col, 'subcutaneous')   13
 10                end
 11          ) result
 12  from test;

        ID RESULT
---------- --------------------------------------------------
         1 tablet, extended release
         2 aerosol
         3 solution

SQL>

CodePudding user response:

You can use:

SELECT TRIM(
         REGEXP_SUBSTR(
           LOWER(value),
           -- Leading space
           ' '
           -- Dosage form
           || '((caps|tab|powd|syr|aero|liq|susp)[a-z]*|solution|lotion|spray)'
           -- Optional trailing punctuation then "extended release"
           || '([^a-z0-9]*extended release)?'
         )
       ) AS dosage_form
FROM   table_name;

Which, for your sample data:

CREATE TABLE table_name (value) AS
SELECT 'Concerta 18 mg/24 hr oral tablet, extended release' FROM DUAL UNION ALL
SELECT 'glipiZIDE 10 mg oral tablet, extended release'      FROM DUAL UNION ALL
SELECT 'Adderall XR 10 mg oral capsule, extended release'   FROM DUAL;

Outputs:

DOSAGE_FORM
tablet, extended release
tablet, extended release
capsule, extended release

Alternatively, you can list all the dosage forms in a table:

CREATE TABLE dosage_forms (form) AS
SELECT 'capsule'    FROM DUAL UNION ALL
SELECT 'capslet'    FROM DUAL UNION ALL
SELECT 'caps'       FROM DUAL UNION ALL
SELECT 'tablet'     FROM DUAL UNION ALL
SELECT 'powder'     FROM DUAL UNION ALL
SELECT 'syringe'    FROM DUAL UNION ALL
SELECT 'aerosol'    FROM DUAL UNION ALL
SELECT 'liquid'     FROM DUAL UNION ALL
SELECT 'suspension' FROM DUAL UNION ALL
SELECT 'solution'   FROM DUAL UNION ALL
SELECT 'lotion'     FROM DUAL UNION ALL
SELECT 'spray'      FROM DUAL;

CREATE TABLE dosage_form_modifiers (form_modifier) AS
SELECT 'extended release' FROM DUAL;

then use the query (from Oracle 12):

SELECT *
FROM   table_name t
       LEFT OUTER JOIN LATERAL (
         SELECT REGEXP_SUBSTR(
                  t.value,
                  '( |^)'
                  || '('
                  || form
                  || form_modifier
                  || ')',
                  1,
                  1,
                  'i',
                  2
                ) AS match
         FROM   dosage_forms df
                CROSS JOIN (
                  SELECT '[^a-z0-9]*' || form_modifier AS form_modifier
                  FROM   dosage_form_modifiers
                UNION ALL
                  SELECT NULL FROM DUAL
                ) dfm
         ORDER BY
                LENGTH(match) DESC NULLS LAST,
                match NULLS LAST
         FETCH FIRST ROW ONLY
       )
       ON (1 = 1)

db<>fiddle here

  • Related