I have a table with this structure :
create table content_relations (
mainConId Integer not null,
relatedConId Integer not null,
similarity float not null,
relatedConAddedOn TIMESTAMP WITH TIME ZONE Not null);
Now I want to have a query by which I should be able to delete rows from this table with these conditions :
delete rows where count of same mainConId exceeds a max limit(say CMax),
i.e only keep CMax items per mainConId and that too with sorting according to similarity desc
i.e. keep only CMax items which are most similar in this table and remove extraneous records for every mainConId
So after running this query I should have at-most n*CMax rows in this table where n is number of unique mainConId.
Can someone help me with the query ? I think it should be possible to do with postgres. Thanks in adv.
CodePudding user response:
First you can try this :
WITH list AS
(
SELECT *
, row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
FROM content_relations
)
SELECT *
FROM list AS l
WHERE row_number <= CMax
Then if the result corresponds to the rows you want to keep, you can delete the extra rows with :
WITH list AS
(
SELECT mainConId
, similarity
, row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
FROM content_relations
)
DELETE FROM content_relations AS cr
USING list AS l
WHERE cr.mainConId = l.mainConId
AND cr.similarity = l.similarity
AND l.row_number > CMax
CodePudding user response:
Based on @Edouard H.'s Answer I reached to the solution point :
WITH list AS
(
SELECT mainConId, relatedConId
, row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
FROM content_relations
)
DELETE FROM content_relations AS cr
USING list AS l
WHERE cr.mainConId = l.mainConId
AND cr.relatedConId = l.relatedConId
AND l.row_number > CMax;