I'm developing a pipeline that communicates a lot to my postgres database.
I'm trying to figure out how I can create a "dry run" mode for my pipeline that essentially uses the fewest resources possible just to see if the pipeline can and would run properly.
I understand that I could just not commit my changes to my database at the end of my script, but it takes about an hour for my pipeline to run all the way through. So it'd be cool if I could just simulate a lot of my postgres queries without actually taking the time to run all of them.
I'll give you an extremely simple example so you can see what I mean...
Let's say the only things in my pipeline are refreshing a bunch of materialized views. But they take a long time to refresh, and there's a lot of them. So I'd like to simply ask the postgres db... "hey, if I ran these refreshes right now, would it go through correctly?"
I feel like this is kinda what happens during the "query planning" phase, but this is just an intuition.
I also acknowledge that the only way to know for absolute sure is to actually run it, but let's just say for now that I know and accept the risks that come along with that.
CodePudding user response:
For INSERT
, UPDATE
, DELETE
and SELECT
statements, you could prepend EXPLAIN
to the statement. That won't return the proper result set, but when it succeeds, the statement should run (unless there is a runtime error).
However, that does not work with REFRESH MATERIALIZED VIEW
...
I guess the best option you have is to use a database that has all the tables, views, functions etc., but is empty. Most statements will complete very quickly on an empty database.