enter image description hereI need help getting this done, since I'm pretty new to SQL and I'm using Bigquery I have this dataset shown in the screenshot, and I have been trying to extract only the year out of the column "Quality_Review___Dates_of_Review" but as shown in the red rectangle, the dates are not in a consistent format, also the column itself is formatted as a STRING!
I tried to CAST to date, datetime, extract, and even substring but everytime I get a different error because of the incosistency and the error in the date inputs itself
CodePudding user response:
Based on your sample data, where you have only 2 formats (starting with a number or starting with a letter), then this works.
The case expression will check if the date starts with a letter. If so, then take the right last 4, else take the first 4 as the year.
with my_data as (
select 1 as id, '2022-10-20T00:00:00.000' as qdate union all
select 2, 'December 3-4, 2013'
)
select id, qdate,
case when regexp_contains(left(qdate,1), r'[a-zA-Z]') is true
then right(qdate, 4)
else left(qdate,4)
end as q_year
from my_data
--Output--
id qdate q_year
1 2022-10-20T00:00:00.000 2022
2 "December 3-4, 2013" 2013