For an interview, I had to write a SQL-Query that deletes BestellNummer and Type duplicates from a tabel.
I wasn't allowed to use a temp table and should do it in one query.
With the help of another question on stackoverflow, I came to this solution (T-SQL: Deleting all duplicate rows but keeping one):
DELETE FROM auftrag WHERE ID NOT IN
(
SELECT MIN(ID) FROM auftrag GROUP BY BestellNummer, Type
)
The auftrag table looked like this:
ID BestellNummer Type Number
0 123 O 1000
1 123 O 1001
2 123 E 1002
3 512 O 1003
4 512 O 1004
5 732 E 1005
The query now deletes ID 1 and 4, cause they are duplicates.
My question is, how does this query actually work? I can make out some bits, but I am a little confused of it.
It would be nice, if someone could give me a breakdown of how it works (:
CodePudding user response:
The GROUP BY clause divides your data into groups based on unique combinations of the columns Bestellnummer and Type, here I have divided the rows with lines to show the groups:
ID BestellNummer Type Number
----------------------------------
0 123 O 1000
1 123 O 1001
----------------------------------
2 123 E 1002
----------------------------------
3 512 O 1003
4 512 O 1004
----------------------------------
5 732 E 1005
Then the MIN(id) simply finds the minimum value of the "id" column in each group, leaving you with id 0, 2, 3, and 5.
Then the DELETE says to delete the rows NOT IN (0, 2, 3, 5), thereby deleting rows 1 and 4, giving you one row per unique combination of Bestellnummer and Type.
CodePudding user response:
By following way you can delete duplicate Records of BestellNummer and Type
WITH cte AS (
SELECT *, row_number() OVER(PARTITION BY [BestellNummer],[Type] ORDER BY [ID]) AS [rn] FROM auftrag
)
DELETE cte WHERE [rn] > 1
Above Query is formated based on MSSQL