Home > Software engineering >  How to convert a hive query with regex to oracle
How to convert a hive query with regex to oracle

Time:03-16

I have this text:

Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples

I just want to get the part after 'Process explanation' but not include 'final activity...'

So like this:

The bottle is then melted to form liquid glass.

This is the current hive query which I want to convert to oracle:

SELECT REGEXP_EXTRACT(
               'Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples',
               '.*(process[ \t]*(explanation)?[ \t]*:[ \t]*)(.*?)([ \t]*;[ \t]*final[ \t] activity[ \t] for[ \t] manager.*$|$)',
               3) as extracted
FROM my_table

CodePudding user response:

If those substrings are just like you said, there's a pretty simple option - substr instr functions.

SQL> with test (col) as
  2    (select 'Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples' from dual)
  3  select substr(col, instr(col, 'Process explanation')   length('Process explanation')   1,
  4                     instr(col, 'Final activity') - instr(col, 'Process explanation') -
  5                       length('Process explanation') - 2
  6               ) result
  7  from test;

RESULT
----------------------------------------------
The bottle is then melted to form liquid glass

SQL>

CodePudding user response:

I've come up with something like this:

with strings as
(SELECT 'Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples' str FROM DUAL)
SELECT str
, REGEXP_SUBSTR(
               str,
           '(.*process[[:blank:]]*(explanation)?[[:blank:]]*:[[:blank:]]*)(.*)(;[[:blank:]]*final[[:blank:]]*activity[[:blank:]]*for[[:blank:]]*manager.*$)',
           1, 1, 'i',3)
                as extracted
FROM strings

Resulting in:

STR EXTRACTED
Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples The bottle is then melted to form liquid glass

Assuming matching blank group instead of your space and tab list [ \t] is ok. The only thing different is the last matching: manager.*$|$) - I'm not sure how to interpret the '|$' match, so I skipped it.

  • Related