I tried to add the following table in PostgreSQL 14.3:
CREATE TABLE client_cache (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
request VARCHAR COMPRESSION lz4 NOT NULL CHECK (LENGTH (request) <= 10240),
request_body BYTEA COMPRESSION lz4 NOT NULL CHECK (LENGTH (request_body) <= 1048576),
request_hash VARCHAR GENERATED ALWAYS AS (MD5(ROW(request::BYTEA, request_body)::VARCHAR)) STORED
);
But Postgres complains:
[42P17] ERROR: generation expression is not immutable
I've seen many answers discussing how to create a GENERATED
column containing the MD5
of a single column, but as soon as you add ROW()
to calculate MD5
over multiple columns, the expression is no longer immutable.
I can create a GENERATED
column using ROW(MD5(A), MD5(B))
but not MD5(ROW(A, B))
.
What can I do instead to create a single MD5
value over multiple columns of varying types (as seen above)?
I know that I can create a view or populate a column using a trigger, but I'd really like to stick to a GENERATED
column if possible.
CodePudding user response:
I suggest an immutable helper function:
CREATE OR REPLACE FUNCTION f_request_md5(_request text, _request_body bytea)
RETURNS uuid
LANGUAGE sql IMMUTABLE COST 10 PARALLEL SAFE AS
'SELECT md5(textin(record_out((md5(_request_body), _request))))::uuid';
And a table like this:
CREATE TABLE client_cache (
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY
, request text COMPRESSION lz4 NOT NULL CHECK (length(request) <= 10240)
, request_body bytea COMPRESSION lz4 NOT NULL CHECK (length(request_body) <= 1048576)
, request_hash uuid GENERATED ALWAYS AS (f_request_md5(request, request_body)) STORED
);
db<>fiddle here
Note the more efficient uuid
instead of varchar
. See:
Background
There are two overloaded variants of md5()
in Postgres 14 (or any supported version):
test=> SELECT (proargtypes::regtype[])[0], prorettype::regtype, provolatile
test-> FROM pg_proc
test-> WHERE proname = 'md5';
proargtypes | prorettype | provolatile
------------- ------------ -------------
bytea | text | i
text | text | i
(2 rows)
One takes bytea
, one text
, both are IMMUTABLE
and return text
. So this expression is immutable:
ROW(MD5(request), MD5(request_body))
But this is not, like you found out the hard way:
MD5(ROW(A, B)::varchar)
The text representation of a record
is not immutable. There are many reasons. One obvious reason for the case at hand: bytea
output can be in (default) hex
format or in the obsolescent escape
format. A plain
SET bytea_output = 'escape';
... would break your generated column.
To get an immutable text representation of a bytea
value, you'd run it through encode(request_body, 'hex')
. But don't go there. md5(request_body)
gives us a faster immutable text "representation" for our purpose.
We still can't cast a record. So I created the wrapper function. Be sure to read this related answer for more explanation:
Like discussed in that answer, the new, built-in function hash_record_extended()
would be much more efficient for the purpose. So if a bigint
is good enough, consider this:
CREATE TABLE client_cache2 (
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY
, request text COMPRESSION lz4 NOT NULL CHECK (length(request) <= 10240)
, request_body bytea COMPRESSION lz4 NOT NULL CHECK (length(request_body) <= 1048576)
, request_hash bigint GENERATED ALWAYS AS (hash_record_extended((request, request_body), 0)) STORED
);
same db<>fiddle here
Works out of the box in Postgres 14 or later.
Related:
- Why am I getting a an error when creating a generated column in PostgreSQL?
- Computed / calculated / virtual / derived columns in PostgreSQL
CodePudding user response:
I think I found a solution!
Postgres doesn't like:
request_hash VARCHAR GENERATED ALWAYS AS (MD5(ROW(request, request_body)::VARCHAR)) STORED
but
request_hash VARCHAR GENERATED ALWAYS AS (MD5(request || request_body::VARCHAR)) STORED
works just fine.
Hooray!