I have a system written in python and running entirely in aws lambda which talks to a postgres database. In the database, there's a bunch of materialized views, which take about an hour all up to fully refresh. I have a function to refresh all the views.
What I can't figure out - is there any way I can trigger the refresh from the lambda? ie - I want to basically connect to the database, start the refresh, and finish - so the lambda session itself only takes like a second - even though the actual refresh takes a long time. Sort of like a "nohup" equivalent inside the db?
CodePudding user response:
no, not possible. postgresql needs an open connection, else it cancels the query.
CodePudding user response:
Yes, it seems possible. You should be able to just open a connection, issue a refresh and then kill the client - the database won't know or care until it completes the refresh. It also won't actively check before it finishes - unless it's configured to do so.
As a test, I created a materialized view that uses a long-running function and saves its last refresh timestamp:
DROP FUNCTION IF EXISTS long_running_function() CASCADE;
CREATE OR replace FUNCTION long_running_function()
RETURNS timestamp LANGUAGE plpgsql
AS $$
BEGIN
PERFORM pg_sleep(20);
RETURN now()::timestamp;
END $$;
DROP MATERIALIZED VIEW IF EXISTS long_refreshing_matview;
CREATE MATERIALIZED VIEW long_refreshing_matview AS
SELECT long_running_function() as "last_refresh_timestamp";
SELECT * FROM long_refreshing_matview;
-- last_refresh_timestamp
----------------------------
-- 2021-11-22 08:21:21.241476
--(1 row)
Then I asked it to refresh from a background psql process:
nohup psql -h 127.0.0.1 -p 5432 -U user dbname -c "REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query" &
Verified it's running on my database:
SELECT pid, now()-xact_start duration, state, query
FROM pg_stat_activity
WHERE query ILIKE '%my_refresh_query%' and pid<>pg_backend_pid();
-- pid | duration | state | query
--------- ----------------- -------- ----------------------------------------------------------------------
-- 27083 | 00:00:06.627445 | active | REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query
--(1 row)
and before the 20 seconds elapsed, I killed the psql process that issued the refresh. I checked again and while the psql process died immediately, its PostgreSQL backend created on the database to serve its query continued running to completion:
SELECT pid,now()-xact_start duration,state,query FROM pg_stat_activity WHERE query ILIKE '%my_refresh_query%' and pid<>pg_backend_pid();
-- pid | duration | state | query
--------- ----------------- -------- ----------------------------------------------------------------------
-- 27083 | 00:00:17.387802 | active | REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query
--(1 row)
SELECT * FROM long_refreshing_matview; --I ran this after the one above completed and disappeared from pg_stat_activity
-- last_refresh_timestamp
------------------------------
-- 2021-11-22 08:24:18.927688
--(1 row)
I've tested it on PostgreSQL 14.1. Depending on your database setup and config you might need to modify client_connection_check_interval and related settings to prevent db from killing off your queries after the serverless client hangs up. Here is a related question over at postgresql mailing list, although a bit old. Here I found some more on both keepalive and "detectdead", up to date.