I have a query that outputs data - row by row - and for each record I grab a value from last_modified_date
column that is the latest date in that column BUT is not later than the date
column's value. I save the new value in column custom_last_modified_date
.
My data looks like this:
id date last_modified_date
A 03/01/22 2022-03-02 22:44
A 03/01/22 2022-02-01 05:14
A 03/01/01 2022-02-28 07:49
B 03/02/22 2022-03-20 07:49
B 03/02/22 2022-03-01 04:46
B 03/02/01 2022-02-28 09:24
The output is:
id date custom_last_modified_date
A 03/01/22 02/28/22
A 03/01/22 02/28/22
A 03/01/01 02/28/22
B 03/02/22 03/01/22
B 03/02/22 03/01/22
B 03/02/01 03/01/22
Here is the code:
SELECT
id,
date,
MAX(
IF(
date(
string(
TIMESTAMP(
DATETIME(
parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
),
'America/New_York'
)
)
) <= date,
date(
string(
TIMESTAMP(
DATETIME(
parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
),
'America/New_York'
)
)
),
null
)
) OVER (PARTITION BY date, id) as custom_last_modified_date
FROM `my_table`
All works fine on sample, but because data is not very clean sometimes values in last_modified_date
look like this: "2021-0-3 05:50"
and then error message is: Failed to parse
Is there a way to JUST filter out the right format of the date in where clause or otherwise exclude bad values within the same query?
Thank you.
CodePudding user response:
You can use the SAFE
prefix to ignore the parser errors, then you can filter the null
rows:
-- Returns null if last_modified_date not match the expected pattern
SAFE.parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
From the docs:
If you begin a function with the SAFE. prefix, it will return NULL instead of an error.
[...]
BigQuery supports the use of the SAFE. prefix with most scalar functions that can raise errors, including STRING functions, math functions, DATE functions, DATETIME functions, TIMESTAMP functions, and JSON functions.
More info: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-reference#safe_prefix