Home > front end >  Parse dynamic substring SQL
Parse dynamic substring SQL

Time:09-20

Querying data in a Postgres SQL database..

I have a column job_name with data like so, where I want to parse the year out of the string (the first 4 numbers)..

AG_2017_AJSJGHJS8GJSJ_7_25_WK_AHGSGHGHS

In this case I want to pull 2017

This works fine and pulls the rows where the 4 digits are >= 2019...

select 
    job_id, 
    cast(substring(job_name, 4, 4) as integer) as year 
from job_header 
where 
    job_type = 'PAMA' 
    and cast(substring(job_name, 4, 4) as integer) >= 2019

The issue is we have some data where the string is formatted slightly differently, and there are 3 characters in the beginning, or 4 (instead of 2), followed by an underscore like this...

APR3_2018_Another_Test_SC_PAMA

I still want to pull the first 4 numbers (so 2018 in this case), cast to int etc...Is there a way to make my first query dynamic to handle this?

CodePudding user response:

You can use a regular expression to find the first group of 4 digits, e.g.:

with job_header(job_id, job_name, job_type) as (
values
    (1, 'AG_2017_AJSJGHJS8GJSJ_7_25_WK_AHGSGHGHS', 'PAMA'),
    (2, 'APR3_2018_Another_Test_SC_PAMA', 'PAMA')
)
select 
    job_id, 
    cast(substring(job_name from '\d{4}') as int) as year 
from job_header 
where job_type = 'PAMA'

 job_id | year
-------- ------
      1 | 2017
      2 | 2018
(2 rows)

Read about POSIX Regular Expressions in the documentation.

  • Related