asynchronously call postgresql from python-CodePudding

I have a system written in python and running entirely in aws lambda which talks to a postgres database. In the database, there's a bunch of materialized views, which take about an hour all up to fully refresh. I have a function to refresh all the views.

What I can't figure out - is there any way I can trigger the refresh from the lambda? ie - I want to basically connect to the database, start the refresh, and finish - so the lambda session itself only takes like a second - even though the actual refresh takes a long time. Sort of like a "nohup" equivalent inside the db?

CodePudding user response：

no, not possible. postgresql needs an open connection, else it cancels the query.

CodePudding user response：

Yes, it seems possible. You should be able to just open a connection, issue a refresh and then kill the client - the database won't know or care until it completes the refresh. It also won't actively check before it finishes - unless it's configured to do so.

As a test, I created a materialized view that uses a long-running function and saves its last refresh timestamp:

DROP FUNCTION IF EXISTS long_running_function() CASCADE;
CREATE OR replace FUNCTION long_running_function()
RETURNS timestamp LANGUAGE plpgsql
AS $$
BEGIN
    PERFORM pg_sleep(20);
    RETURN now()::timestamp;
END $$;

DROP MATERIALIZED VIEW IF EXISTS long_refreshing_matview;
CREATE MATERIALIZED VIEW long_refreshing_matview AS 
SELECT  long_running_function() as "last_refresh_timestamp";

SELECT * FROM long_refreshing_matview;
--   last_refresh_timestamp
----------------------------
-- 2021-11-22 08:21:21.241476
--(1 row)

Then I asked it to refresh from a background psql process:

nohup psql -h 127.0.0.1 -p 5432 -U user dbname -c "REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query" &

Verified it's running on my database:

SELECT pid, now()-xact_start duration, state, query 
FROM pg_stat_activity 
WHERE query ILIKE '%my_refresh_query%' and pid<>pg_backend_pid();
--  pid  |    duration     | state  |                                query
--------- ----------------- -------- ----------------------------------------------------------------------
-- 27083 | 00:00:06.627445 | active | REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query
--(1 row)

and before the 20 seconds elapsed, I killed the psql process that issued the refresh. I checked again and while the psql process died immediately, its PostgreSQL backend created on the database to serve its query continued running to completion:

SELECT pid,now()-xact_start duration,state,query FROM pg_stat_activity WHERE query ILIKE '%my_refresh_query%' and pid<>pg_backend_pid();
--  pid  |    duration     | state  |                                query
--------- ----------------- -------- ----------------------------------------------------------------------
-- 27083 | 00:00:17.387802 | active | REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query
--(1 row)
SELECT * FROM long_refreshing_matview; --I ran this after the one above completed and disappeared from pg_stat_activity
--   last_refresh_timestamp
------------------------------
-- 2021-11-22 08:24:18.927688
--(1 row)

I've tested it on PostgreSQL 14.1. Depending on your database setup and config you might need to modify client_connection_check_interval and related settings to prevent db from killing off your queries after the serverless client hangs up. Here is a related question over at postgresql mailing list, although a bit old. Here I found some more on both keepalive and "detectdead", up to date.