Home > front end >  Postgres regexp_matches. How to find word after range of words
Postgres regexp_matches. How to find word after range of words

Time:01-24

I want to parse query in postgres I've got string query, for example Select * FROM dm.lib Update sc.qar Join ws.gys

How can I get first word after following pattern words - SELECT, UPDATE, JOIN?

I use this query and get many NULL values.

with src(txt) as (
  values ('Select * FROM dm.lib Update sc.qar Join ws.gys')
)
select unnest(regexp_matches (txt, 'FROM ([^ ] ) | SELECT ([^ ] ) | UPDATE ([^ ] ) | JOIN ([^ ] ) | DELETE ([^ ] ) | MERGE ([^ ] )','g')) 
from src 
where ;

it is important that upper case of lower doesn't matter. My query doesn't work correctly

CodePudding user response:

demo
Probably over engineered. There should be a simple way, but the following is how I do.

WITH src (txt) AS (
    VALUES ('Select * FROM       dm.lib Update     sc.qar Join     ws.gys'))
,cte (keywords) AS (
    SELECT unnest('{FROM,select,UPDATE,JOIN,DELETE,MERGE}'::text[]))
,cte1 AS (
    SELECT
        regexp_instr (lower(txt), lower(keywords) || '\s ') AS BEGIN
    ,lower(keywords) || '\s ' AS pattern
    ,lower(keywords) AS keywords
    ,lower(txt) AS src
FROM    cte, src
)
,cte2 AS (
    SELECT
        *
        ,regexp_instr (src, pattern,BEGIN, 1, 1) AS pattern_end
        ,coalesce(lead(BEGIN) OVER (ORDER BY BEGIN),length(src)   2) 
        - 1 
        - regexp_instr (src, pattern,BEGIN, 1,1) AS gap
FROM cte1 WHERE BEGIN != 0
ORDER BY 1
)
SELECT
    *, substring(src FROM pattern_end FOR gap)
FROM cte2;

src, cte, cte1 used to normalized strings.
Key function is regexp_instr (only in postgres 15: https://www.postgresql.org/docs/current/functions-matching.html).

Key gotcha is that You can use regexp_instr get the match substring begin char position and end char position, It feel like char pointer walk through in C.

  • Related