Home > Software engineering >  Extract the YEAR from a column set as a string with different formates for the dates in SQL
Extract the YEAR from a column set as a string with different formates for the dates in SQL

Time:10-11

enter image description hereI need help getting this done, since I'm pretty new to SQL and I'm using Bigquery I have this dataset shown in the screenshot, and I have been trying to extract only the year out of the column "Quality_Review___Dates_of_Review" but as shown in the red rectangle, the dates are not in a consistent format, also the column itself is formatted as a STRING!

I tried to CAST to date, datetime, extract, and even substring but everytime I get a different error because of the incosistency and the error in the date inputs itself

CodePudding user response:

Based on your sample data, where you have only 2 formats (starting with a number or starting with a letter), then this works.

The case expression will check if the date starts with a letter. If so, then take the right last 4, else take the first 4 as the year.

with my_data as (
  select 1 as id, '2022-10-20T00:00:00.000' as qdate union all
  select 2, 'December 3-4, 2013'
)
select id, qdate, 
  case when regexp_contains(left(qdate,1), r'[a-zA-Z]') is true
   then right(qdate, 4) 
   else left(qdate,4) 
  end as q_year
from my_data

--Output--
id  qdate                    q_year
1   2022-10-20T00:00:00.000  2022
2   "December 3-4, 2013"     2013
  • Related