Example of Text:
PROCEDURE: Cryo balloon antral pulmonary vein
isolation and cavotricuspid isthmus ablation.
The patient is a middle aged and happy.
I am trying to extract "Cryo balloon antral pulmonary vein isolation and cavotricuspid isthmus ablation" from the text.
The Code I used:
TRIM(REGEXP_SUBSTR(a.Document_Text, 'Procedure:\s*(.*)\s*?\.',1,1,'inm',1)) as Text_Procedure_DESC
But what I get is the whole note after 'Procedure:'
FYI:
- There are text fields Prior to and after the Procedure entry.
- There are multiple spaces and vary from note to note.
Not sure where I messed up!
CodePudding user response:
- You want to make the
.*
pattern non-greedy by using.*?
- The
m
option is redundant (as you are not matching the start or end of lines) - and you don't need the
TRIM
function.
Like this:
SELECT REGEXP_SUBSTR(document_text, 'PROCEDURE:\s (.*?)\.', 1, 1, 'in' , 1)
AS Text_Procedure_DESC
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT 'PROCEDURE: Cryo balloon antral pulmonary vein
isolation and cavotricuspid isthmus ablation.
The patient is a middle aged and happy.' FROM DUAL
Outputs:
TEXT_PROCEDURE_DESC Cryo balloon antral pulmonary vein
isolation and cavotricuspid isthmus ablation
db<>fiddle here