I have the following view:
CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT DISTINCT ON (coin) *
from {accountId}.{tableAssetsName}
ORDER BY coin, ts DESC;
The view returns, for each coin, the latest update.
and, in an unrelated question, a comment was:
Note: DISTINCT is always a red flag. (almost always)
I read about this and it seems to have a performance issue with Postgres. In my scenario, I don't have any issues with performance, but I would like to understand:
how could such a query be rewritten to not use DISTINCT then?
CodePudding user response:
There is a substantial difference between DISTINCT
and the Postgres proprietary DISTINCT ON ()
operator.
Does whoever wrote that "DISTINCT is always a red flag" actually know the difference between DISTINCT
and DISTINCT ON
in Postgres?
The problem that your view solves, is known as greatest-n-per-group and in Postgres distinct on
is typically more efficient than the alternatives.
The problem can be solved with e.g. window functions:
CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT ....
FROM (
SELECT *,
row_number() over (partition by coin order by ts desc) as rn
from {accountId}.{tableAssetsName}
) t
WHERE rn = 1;
But I wouldn't be surprised if that is actually slower than you current solution.