Home > Mobile >  how to remove DISTINCT from query, in Postgresql
how to remove DISTINCT from query, in Postgresql

Time:11-05

I have the following view:

CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT DISTINCT ON (coin) * 
from {accountId}.{tableAssetsName} 
ORDER BY coin, ts DESC;

The view returns, for each coin, the latest update.

and, in an unrelated question, a comment was:

Note: DISTINCT is always a red flag. (almost always)

I read about this and it seems to have a performance issue with Postgres. In my scenario, I don't have any issues with performance, but I would like to understand:

how could such a query be rewritten to not use DISTINCT then?

CodePudding user response:

There is a substantial difference between DISTINCT and the Postgres proprietary DISTINCT ON () operator.

Does whoever wrote that "DISTINCT is always a red flag" actually know the difference between DISTINCT and DISTINCT ON in Postgres?

The problem that your view solves, is known as and in Postgres distinct on is typically more efficient than the alternatives.

The problem can be solved with e.g. window functions:

CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT ....
FROM (
  SELECT *, 
         row_number() over (partition by coin order by ts desc) as rn
  from {accountId}.{tableAssetsName}   
) t 
WHERE rn = 1;

But I wouldn't be surprised if that is actually slower than you current solution.

  • Related