Querying data in a Postgres
SQL database..
I have a column job_name
with data like so, where I want to parse the year out of the string (the first 4 numbers)..
AG_2017_AJSJGHJS8GJSJ_7_25_WK_AHGSGHGHS
In this case I want to pull 2017
This works fine and pulls the rows where the 4 digits are >= 2019...
select
job_id,
cast(substring(job_name, 4, 4) as integer) as year
from job_header
where
job_type = 'PAMA'
and cast(substring(job_name, 4, 4) as integer) >= 2019
The issue is we have some data where the string is formatted slightly differently, and there are 3 characters in the beginning, or 4 (instead of 2), followed by an underscore like this...
APR3_2018_Another_Test_SC_PAMA
I still want to pull the first 4 numbers (so 2018 in this case), cast to int etc...Is there a way to make my first query dynamic to handle this?
CodePudding user response:
You can use a regular expression to find the first group of 4 digits, e.g.:
with job_header(job_id, job_name, job_type) as (
values
(1, 'AG_2017_AJSJGHJS8GJSJ_7_25_WK_AHGSGHGHS', 'PAMA'),
(2, 'APR3_2018_Another_Test_SC_PAMA', 'PAMA')
)
select
job_id,
cast(substring(job_name from '\d{4}') as int) as year
from job_header
where job_type = 'PAMA'
job_id | year
-------- ------
1 | 2017
2 | 2018
(2 rows)
Read about POSIX Regular Expressions in the documentation.