How to reuse a computed value multiple times?-CodePudding

Basically I just want a simple way of finding the most recent date in a table, saving it as a variable, and reusing that variable in the same query.

Right now this is how I'm doing it:

with recent_date as (
  select max(date)
  from mytable
)

select *
from mytable
where date = (select * from recent_date)

(For this simple example, a variable is overkill, but in my real-world use-case I reuse the recent date multiple times in the same query.)

But that feels cumbersome. It would be a lot cleaner to save the recent date to a variable rather than a table and having to select from it.

In pseudo-code, something like this would be nice:

$recent_date = (select max(date) from mytable)

select *
from mytable
where date = $recent_date

Is there something like that in Postgres?

CodePudding user response：

Keep in mind that in SQL you cannot directly declare a variable. Basically a CTE is creating variable (or a set of) and in SQL to use a variable you select it. However, if you want to avoid that structure you can just get the variable directl from a subset directly.

select *
  from mytable
 where date = (select max(date) from mytable);

CodePudding user response：

Better for the simple case

For the scope of a single query, CTEs are a good tool. In my hands the query would look like this:

WITH recent(date) AS (SELECT max(date) FROM mytable)
SELECT m.*
FROM   recent r
JOIN   mytable m USING (date)

Except that the actual example query would burn down to this in my hands:

SELECT *
FROM   mytable
ORDER  BY date DESC NULLS LAST
FETCH  FIRST 1 ROWS WITH TIES;

NULLS LAST only if there can be NULL values. See:

Sort by column ASC, but NULL values first?

WITH TIES only if date isn't UNIQUE NOT NULL. See:

Get top row(s) with highest value, with ties

In combination with an index on mytable (date) (or more specific), this produces the best possible query plan. Look no further.

No, I need variables!

If you positively need variables scoped for the same command, transaction, session or more, there are various options.

The closest thing to "variables" in SQL in Postgres are "customized options". See:

User defined variables in PostgreSQL

You can only store text, any other type has to be cast (and cast back on retrieval).
To set and retrieve a value from within a query, use the Configuration Settings Functions set_config() and current_setting():

SELECT set_config('foo.recent', max(date)::text, false) FROM mytable;

SELECT *
FROM   mytable
WHERE  date = current_setting('foo.recent')::date;

Typically, there are more efficient ways.

If you need that "recent date" a lot, consider a simple function as "global variable", usable by all transactions in all sessions (but each new command sees its own current state):

CREATE FUNCTION f_recent_date()
  RETURNS date
  LANGUAGE sql STABLE PARALLEL SAFE AS
'SELECT max(date) FROM mytable';

STABLE is a valid volatility setting as the function returns the same result within the same query. Be sure to actually make it STABLE, so Postgres does not evaluate repeatedly. In Postgres 9.6 or later, also make it PARALLEL SAFE. Then your query becomes:

SELECT * FROM mytable WHERE date = f_recent_date();

More options:

Typically, if I need variables in Postgres, I use a PL/pgSQL code block in a function, a procedure, or a DO statement for ad-hoc use without the need to return rows:

DO
$do$
DECLARE
   _recent_date date := (SELECT max(date) FROM mytable);
BEGIN
   PERFORM * FROM mytable WHERE date = _recent_date;
   -- more queries using _recent_date ...
END
$do$;

PL/pgSQL may be what you should be using to begin with. Further reading:

When to use stored procedure / user-defined function?