Postgresql, how get one row for each distinct relation_id value in one pass over table?-CodePudding

For example, we have table:

id	value	relation_id
1	value1	1
2	value2	2
3	value3	1
4	value4	1
5	value5	3

And I want get rows with ids 1, 2, 5, for example (because they have distinct relation_id values). It is easy, but not if you have several billions rows. It is pretty slow even on SSD drive. I have each possible value for relation_id and I tried such query:

(select value, relation_id from table where relation_id=2 limit 1)
union
(select value, relation_id from table where relation_id=3 limit 1)
-- so on

But turns out that for each subquery postgresql looking in the table from begining despite that it is one query. Is there a way to write query in such manner, that postgresql make single pass over table and collect required data along the way?

CodePudding user response：

Approach 1: Use PostgreSQL DISTINCT ON operator. It will match only distinct values of the field(s) inside the parentheses, given an ORDER BY clause, to prevent duplicates.

SELECT DISTINCT ON(relation_id) id_, value_, relation_id
FROM tab
ORDER BY relation_id

Check the demo here.

Approach 2: Use the ROW_NUMBER window function to generate a ranking over your records partitioned per "relation_id", then select the first record for each relation (rownum = 1)

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY relation_id ORDER BY id_) AS rn 
    FROM tab
)
SELECT id_, value_, relation_id 
FROM cte
WHERE rn = 1

Check the demo here.

Approach 3: Use ROW_NUMBER inside the filtering construct FETCH FIRST ROWS WITH TIES, which will act as Approach 2 but avoids you the subquery (gets tied rows, tied on rownum=1).

SELECT *
FROM tab
ORDER BY ROW_NUMBER() OVER(PARTITION BY relation_id ORDER BY id_) = 1 DESC
FETCH FIRST 1 ROWS WITH TIES

Check the demo here.