I am trying to build a summary table in our Postgres database that contains information about when various materialized views were refreshed. I would also like this table to trigger the actual refreshes.
The desired format for the table is as below, call it mv_refresh_monitor
:
view_name | refresh_time_start | refresh_time_end |
---|---|---|
view_one | 2022-02-01 22:10:59.234567 | 2022-02-01 22:11:59.234567 |
The table shows view_one
was last refreshed late at night February 1st, and the refresh took 1 minute to complete.
What I would like to do is trigger the materialized refreshes by updating the refresh_time_start
field; doing so would trigger the materialized view in the view_name
field to refresh, and then also update the same row's refresh_time_end
field to capture the time when the refresh is done.
My current implementation uses a function (to update the monitor table), a trigger function (to both refresh the view and call the function), and a trigger on the monitor table to call the trigger function. This is scoped only for a single materialized view:
CREATE OR REPLACE FUNCTION "ingested_digital_spend"."timestamp_refresh_end"()
RETURNS "pg_catalog"."void" AS $BODY$
UPDATE schema.mv_refresh_monitor SET refresh_time_end = CURRENT_TIMESTAMP AT TIME ZONE 'America/Los_Angeles';
$BODY$
LANGUAGE SQL;
CREATE OR REPLACE FUNCTION "schema"."refresh_materialized_view"()
RETURNS "pg_catalog"."trigger" AS $BODY$
BEGIN
IF NEW.view_name = 'view_one' AND NEW.refresh_time_start IS DISTINCT FROM OLD.refresh_time_start THEN
REFRESH MATERIALIZED VIEW schema.view_one;
PERFORM timestamp_refresh_end();
END IF;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER "refresh_mv" AFTER UPDATE ON "schema"."mv_refresh_monitor"
FOR EACH ROW
EXECUTE PROCEDURE "schema"."refresh_materialized_view"();
This almost works, but I'm trying to improve / fix three things:
- Make this parameterized so that I don't have to write new
IF THEN END IF
clauses in the trigger function each time I add a new materialized view to the schema.- This seems like it shouldn't be terribly hard, I just haven't figured out the right way to parameterize PL/pgSQL functions yet.
- Currently the time recorded in
refresh_time_start
andrefresh_time_end
are identical, despite the refresh operation itself taking 80 seconds. I am not sure how to scope theCURRENT_TIMESTAMP
operations so that they don't evaluate to the same timestamp when the trigger function is initially called.- This feels like it should be possible, but I'm not as sure of this.
- I would like the actual refresh to happen "in the background" if possible. That is, for the UPDATE to complete immediately and release the session that is performing the UPDATE. Right now the transaction on the monitor table doesn't complete until the view REFRESH transaction itself completes, so the client session hangs until the refresh completes.
- This might be impossible.
Any suggestions or solutions for getting closer to these three requirements?
CodePudding user response:
I think you are going about this the wrong way by pushing to much into the trigger/function. I would go with:
A function that you provide the view name and start name to. It does the
REFRESH MATERIALIZED VIEW some_view>
and updatesmv_refresh_monitor
with information. For information on how to parametrize this see Dynamic QueriesFor the
CURRENT_TIMESTAMP
issue see Current date/time.CURRENT_TIMESTAMP
by design captures the timestamp at the start of a transaction and does not change in the transaction. You are looking fortransaction_timestamp()/statement_timestamp()
.If you don't tie the
REFRESH MATERIALIZED VIEW
to theUPDATE
you eliminate this issue.