hi I loaded a csv file into my database table, where I appended the rows from the csv file. But I accidentally appended the rows twice so now the data is showing up as duplicates in my table, which looks like this:
data_run_date | fiscal_quarter | invoice_id | invoice_type |
---|---|---|---|
2022-07-01 | Q4 | 12345 | Paid |
2022-07-01 | Q4 | 12345 | Paid |
2022-01-01 | Q1 | 12345 | Not Paid |
2022-07-01 | Q4 | 12678 | Paid |
2022-01-01 | Q1 | 12678 | Not Paid |
I need the table to look like this
data_run_date | fiscal_quarter | invoice_id | invoice_type |
---|---|---|---|
2022-07-01 | Q4 | 12345 | Paid |
2022-01-01 | Q1 | 12345 | Not Paid |
2022-07-01 | Q4 | 12678 | Paid |
2022-01-01 | Q1 | 12678 | Not Paid |
the second record of paid for invoice id 12345 was a result of my accidental append to the table twice. I a trying to delete this record from the database table.
Now the simple solution would be to do this:
DELETE FROM db.invoice
WHERE data_run_date = TIMESTAMP '2022-07-01 00:00:00.000'
and then re-append the csv
But I wanted to know if there are other better/easier/more efficient ways to do it. So I tried this:
DELETE FROM db.invoice
WHERE EXISTS (
SELECT * FROM
(
SELECT t.*,
ROW_NUMBER () OVER (PARTITION BY INVOICE_ID
ORDER BY DATA_RUN_DATE) RN
FROM db.invoice t
)
WHERE RN = 2
AND DATA_RUN_DATE = TIMESTAMP '2022-07-01 00:00:00.000'
)
But this deleted all the records from the table (luckily I am working on a temp table so I was able to add all the original data to the temp table from the main table using the INSERT statement. But if someone could help me with a better solution. I would much appreciate it.
thank you
CodePudding user response:
Here's one option.
Sample data:
SQL> select * From invoice order by invoice_id, fiscal_quarter desc;
DATA_RUN_D FI INVOICE_ID INVOICE_
---------- -- ---------- --------
2022-07-01 Q4 12345 Paid
2022-07-01 Q4 12345 Paid
2022-01-01 Q1 12345 Not Paid
2022-07-01 Q4 12678 Paid
2022-01-01 Q1 12678 Not Paid
Delete duplicates:
SQL> delete from invoice a
2 where a.rowid > (select min(b.rowid)
3 from invoice b
4 where b.invoice_id = a.invoice_id
5 and b.fiscal_quarter = a.fiscal_quarter
6 and b.invoice_type = a.invoice_type
7 and b.data_run_date = a.data_run_date
8 );
1 row deleted.
Result:
SQL> select * From invoice order by invoice_id, fiscal_quarter desc;
DATA_RUN_D FI INVOICE_ID INVOICE_
---------- -- ---------- --------
2022-07-01 Q4 12345 Paid
2022-01-01 Q1 12345 Not Paid
2022-07-01 Q4 12678 Paid
2022-01-01 Q1 12678 Not Paid
SQL>
CodePudding user response:
You can use:
DELETE FROM db.invoice
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT ROW_NUMBER() OVER (
PARTITION BY data_run_date, fiscal_quarter, invoice_id, invoice_type
ORDER BY NULL
) AS rn
FROM db.invoice
)
WHERE rn > 1
)
Which deletes the duplicate row.
db<>fiddle here