Delete and keep certain rows based on conditions from table in postgresql-CodePudding

I have a table with this structure :

create table content_relations (
    mainConId    Integer not null,
    relatedConId Integer not null,
    similarity   float not null,
    relatedConAddedOn TIMESTAMP WITH TIME ZONE Not null);

Now I want to have a query by which I should be able to delete rows from this table with these conditions :

delete rows where count of same mainConId exceeds a max limit(say CMax), 
i.e only keep CMax items per mainConId and that too with sorting according to similarity desc
i.e. keep only CMax items which are most similar in this table and remove extraneous  records for every mainConId

So after running this query I should have at-most n*CMax rows in this table where n is number of unique mainConId.

Can someone help me with the query ? I think it should be possible to do with postgres. Thanks in adv.

CodePudding user response：

First you can try this :

WITH list AS
(
SELECT *
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
SELECT *
  FROM list AS l
 WHERE row_number <= CMax

Then if the result corresponds to the rows you want to keep, you can delete the extra rows with :

WITH list AS
(
SELECT mainConId
     , similarity
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
DELETE FROM content_relations AS cr
 USING list AS l
 WHERE cr.mainConId = l.mainConId
   AND cr.similarity = l.similarity
   AND l.row_number > CMax

CodePudding user response：

Based on @Edouard H.'s Answer I reached to the solution point :

WITH list AS
(
SELECT mainConId, relatedConId 
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
DELETE FROM content_relations AS cr
 USING list AS l
 WHERE cr.mainConId = l.mainConId
   AND cr.relatedConId = l.relatedConId
   AND l.row_number > CMax;