How to correct my Snowflake Unique Constraint SQL statement?-CodePudding

I have a table that looks like:

ID|CREATED   |VALUE
1 |1649122158|200
1 |1649122158|200
1 |1649122158|200

That I'd like to look like:

ID|CREATED   |VALUE
1 |1649122158|200

And I run the following query:

DELETE FROM MY_TABLE T USING (SELECT ID,CREATED,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CREATED DESC) AS RANK_IN_KEY FROM MY_TABLE T) X WHERE X.RANK_IN_KEY <> 1 AND T.ID = X.ID AND T.CREATED = X.CREATED

But it removes everything from MY_TABLE and not just other rows with the same value. This is more than just selecting distinct records, I'd like to enforce a unique constraint to get the latest value of ID and keep just one record for it, even if there were duplicates.

ID|CREATED   |VALUE
1 |1649122158|200
1 |1649122159|300
2 |1649122158|200
2 |1649122158|200
3 |1649122170|500
3 |1649122160|200

Would become (using the same final unique constraint statement):

ID|CREATED   |VALUE
1 |1649122159|300
2 |1649122158|200
3 |1649122170|500

How can I improve my logic to properly handle these unique constraint modifications?

CodePudding user response：

The following setup should leave rows with id of 1 and 3. And not delete all rows as you say.

Schema

create table t (
  id int,
  created int , 
  value int
);

insert into t values(1, 1649122158, 200);
insert into t values(1 ,1649122159, 300);
insert into t values(2 ,1649122158, 200);
insert into t values(2 ,1649122158, 200);
insert into t values(3 ,1649122170, 500);
insert into t values(3 ,1649122160, 200);

Delete statement

with x as (
 SELECT 
  id, created,
  row_number() over(partition by id) as r
 FROM t
)
delete from t
using x
where x.id = t.id and x.r <> 1 and x.created = t.created
;

Output

select * from t;

1   1649122158  200
3   1649122170  500

The logic is such, that the table in the using clause is joined with the operated on table. Following the join logic, it just matches by some key. In your case, you have key as {id,created}. This key is duplicated for rows with id of 2. So the whole group is deleted.

I'm no savvy in database schemas. But as a thought, you may add a row with a rank to existing table. And after that you can proceed with deletion. This way you do not need to create other table and insert values to that. Be warned that data may become fragmented(physically, on disks). So you will need to run some kind of tune up later.

Update

You may find this almost one-liner interesting:

SO answer

I will duplicate code here, as it is so small and well written.

WITH
  u AS (SELECT DISTINCT * FROM your_table),
  x AS (DELETE FROM your_table)
INSERT INTO your_table SELECT * FROM u;

CodePudding user response：

Check out this post: https://community.snowflake.com/s/question/0D50Z00008EJgemSAD/how-to-delete-duplicate-records-

If all columns make up a unique records, the recommended solution is the insert all the records into a new table with SELECT DISTINCT * and do a swap. You could also do a INSERT OVERWRITE INTO the same table.

Something like INSERT OVERWRITE INTO tableA SELECT DISTINCT * FROM tableA;