I have a source table in BigQuery into which data is ingested from CSV files. The source table has a column named "ingestDate" of type DATETIME. The destination table's structure is exactly the same as the source.
SOURCE TABLE:
ID | ingestDate | Column1 |
---|---|---|
113 | 2022-06-02T00:00:00 | Value4 |
112 | null | Value3 |
111 | 2022-06-01T00:00:00 | Value2 |
110 | 2022-05-31T00:00:00 | Value1 |
DESTINATION TABLE:
ID | ingestDate | Column1 |
---|---|---|
111 | 2022-06-01T00:00:00 | Value2 |
110 | 022-05-31T00:00:00 | Value1 |
I'm need to write a stored procedure to do the following:
- Copy all the latest values into destination table from the source table. (This is simple enough by comparing the timestamps and I can get this done)
- If there are any null values in column "ingestDate" within the source table, then throw an error and exit the stored procedure without mapping any data at all.
Here is some additional information:
- Column "ingestDate" does not have a NOT NULL clause, and there is no way for me to modify the structure of the table to include one.
- I am also trying to get the procedure to return the number of new rows inserted.
This is the procedure I have so far:
CREATE OR REPLACE PROCEDURE dataset.sample1()
BEGIN
DECLARE sourceCount, initDestCount, finalDestCount INT64;
DECLARE latestIngestDate DATETIME;
SET sourceCount = (SELECT COUNT(*) FROM dataset.sourceTable);
SET initDestCount = (SELECT COUNT(*) FROM dataset.destTable);
SET latestIngestDate = (SELECT MAX(ingestDate) FROM dataset.destTable);
BEGIN
INSERT INTO dataset.destTable (
ID,
ingestDate,
Column1)
SELECT
CAST(ID AS INT64),
CAST(ingestDate AS DATETIME),
CAST(Column1 AS STRING)
FROM
dataset.sourceTable
WHERE
ingestDate > latestIngestDate
END;
SET finalDestCount = (SELECT COUNT(*) FROM dataset.destTable);
SELECT sourceCount, initDestCount, finalDestCount;
END;
Can someone please help me implement the error mechanism that I was looking to implement in the procedure above? Any help with optimizing this script would also be greatly appreciated.
Thank you!
CodePudding user response:
can you do a count of the nulls and if there is one then throw an exception before doing your insert?
DECLARE
e_no_date EXCEPTION;
PRAGMA exception_init( e_no_date, -20001 );
mycount int;
BEGIN
SELECT Count(*)
INTO mycount
FROM
dataset.sourceTable
WHERE
ingestDate is null;
IF mycount > 0 THEN
RAISE e_no_date;
END IF;
CodePudding user response:
change the where like this
WHERE
ingestDate is not null and
ingestDate > latestIngestDate
https://cloud.google.com/bigquery/docs/reference/standard-sql/operators#is_operators
Based on comment -- I'd add a flag to the return, throwing exceptions is VERY slow and generally considered a poor choice for flow of control.
SET errorFlag = SELECT COUNT(*) FROM dataset.sourceTable where ingestDate is null;
SELECT sourceCount, initDestCount, finalDestCount, errorFlag;