Home > Blockchain >  asynchronously call postgresql from python
asynchronously call postgresql from python

Time:11-23

I have a system written in python and running entirely in aws lambda which talks to a postgres database. In the database, there's a bunch of materialized views, which take about an hour all up to fully refresh. I have a function to refresh all the views.

What I can't figure out - is there any way I can trigger the refresh from the lambda? ie - I want to basically connect to the database, start the refresh, and finish - so the lambda session itself only takes like a second - even though the actual refresh takes a long time. Sort of like a "nohup" equivalent inside the db?

CodePudding user response:

no, not possible. postgresql needs an open connection, else it cancels the query.

CodePudding user response:

Yes, it seems possible. You should be able to just open a connection, issue a refresh and then kill the client - the database won't know or care until it completes the refresh. It also won't actively check before it finishes - unless it's configured to do so.

As a test, I created a materialized view that uses a long-running function and saves its last refresh timestamp:

DROP FUNCTION IF EXISTS long_running_function() CASCADE;
CREATE OR replace FUNCTION long_running_function()
RETURNS timestamp LANGUAGE plpgsql
AS $$
BEGIN
    PERFORM pg_sleep(20);
    RETURN now()::timestamp;
END $$;

DROP MATERIALIZED VIEW IF EXISTS long_refreshing_matview;
CREATE MATERIALIZED VIEW long_refreshing_matview AS 
SELECT  long_running_function() as "last_refresh_timestamp";

SELECT * FROM long_refreshing_matview;
--   last_refresh_timestamp
----------------------------
-- 2021-11-22 08:21:21.241476
--(1 row)

Then I asked it to refresh from a background psql process:

nohup psql -h 127.0.0.1 -p 5432 -U user dbname -c "REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query" &

Verified it's running on my database:

SELECT pid, now()-xact_start duration, state, query 
FROM pg_stat_activity 
WHERE query ILIKE '%my_refresh_query%' and pid<>pg_backend_pid();
--  pid  |    duration     | state  |                                query
--------- ----------------- -------- ----------------------------------------------------------------------
-- 27083 | 00:00:06.627445 | active | REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query
--(1 row)

and before the 20 seconds elapsed, I killed the psql process that issued the refresh. I checked again and while the psql process died immediately, its PostgreSQL backend created on the database to serve its query continued running to completion:

SELECT pid,now()-xact_start duration,state,query FROM pg_stat_activity WHERE query ILIKE '%my_refresh_query%' and pid<>pg_backend_pid();
--  pid  |    duration     | state  |                                query
--------- ----------------- -------- ----------------------------------------------------------------------
-- 27083 | 00:00:17.387802 | active | REFRESH MATERIALIZED VIEW long_refreshing_matview;--my_refresh_query
--(1 row)
SELECT * FROM long_refreshing_matview; --I ran this after the one above completed and disappeared from pg_stat_activity
--   last_refresh_timestamp
------------------------------
-- 2021-11-22 08:24:18.927688
--(1 row)

I've tested it on PostgreSQL 14.1. Depending on your database setup and config you might need to modify client_connection_check_interval and related settings to prevent db from killing off your queries after the serverless client hangs up. Here is a related question over at postgresql mailing list, although a bit old. Here I found some more on both keepalive and "detectdead", up to date.

  • Related