how to remove DISTINCT from query, in Postgresql-CodePudding

I have the following view:

CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT DISTINCT ON (coin) * 
from {accountId}.{tableAssetsName} 
ORDER BY coin, ts DESC;

The view returns, for each coin, the latest update.

and, in an unrelated question, a comment was:

Note: DISTINCT is always a red flag. (almost always)

I read about this and it seems to have a performance issue with Postgres. In my scenario, I don't have any issues with performance, but I would like to understand:

how could such a query be rewritten to not use DISTINCT then?

CodePudding user response：

There is a substantial difference between DISTINCT and the Postgres proprietary DISTINCT ON () operator.

Does whoever wrote that "DISTINCT is always a red flag" actually know the difference between DISTINCT and DISTINCT ON in Postgres?

The problem that your view solves, is known as greatest-n-per-group and in Postgres distinct on is typically more efficient than the alternatives.

The problem can be solved with e.g. window functions:

CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT ....
FROM (
  SELECT *, 
         row_number() over (partition by coin order by ts desc) as rn
  from {accountId}.{tableAssetsName}   
) t 
WHERE rn = 1;

But I wouldn't be surprised if that is actually slower than you current solution.