Delete rows from Oracle table where count is more than 1 and date is a certain date-CodePudding

hi I loaded a csv file into my database table, where I appended the rows from the csv file. But I accidentally appended the rows twice so now the data is showing up as duplicates in my table, which looks like this:

data_run_date	fiscal_quarter	invoice_id	invoice_type
2022-07-01	Q4	12345	Paid
2022-07-01	Q4	12345	Paid
2022-01-01	Q1	12345	Not Paid
2022-07-01	Q4	12678	Paid
2022-01-01	Q1	12678	Not Paid

I need the table to look like this

data_run_date	fiscal_quarter	invoice_id	invoice_type
2022-07-01	Q4	12345	Paid
2022-01-01	Q1	12345	Not Paid
2022-07-01	Q4	12678	Paid
2022-01-01	Q1	12678	Not Paid

the second record of paid for invoice id 12345 was a result of my accidental append to the table twice. I a trying to delete this record from the database table.

Now the simple solution would be to do this:

DELETE FROM db.invoice
WHERE data_run_date = TIMESTAMP '2022-07-01 00:00:00.000'

and then re-append the csv

But I wanted to know if there are other better/easier/more efficient ways to do it. So I tried this:

DELETE FROM db.invoice  
WHERE EXISTS (
SELECT * FROM 
(
SELECT t.*,
       ROW_NUMBER () OVER (PARTITION BY INVOICE_ID
                                ORDER BY DATA_RUN_DATE) RN 
FROM db.invoice t
)
WHERE RN = 2
AND DATA_RUN_DATE = TIMESTAMP '2022-07-01 00:00:00.000'
)

But this deleted all the records from the table (luckily I am working on a temp table so I was able to add all the original data to the temp table from the main table using the INSERT statement. But if someone could help me with a better solution. I would much appreciate it.

thank you

CodePudding user response：

Here's one option.

Sample data:

SQL> select * From invoice order by invoice_id, fiscal_quarter desc;

DATA_RUN_D FI INVOICE_ID INVOICE_
---------- -- ---------- --------
2022-07-01 Q4      12345 Paid
2022-07-01 Q4      12345 Paid
2022-01-01 Q1      12345 Not Paid
2022-07-01 Q4      12678 Paid
2022-01-01 Q1      12678 Not Paid

Delete duplicates:

SQL> delete from invoice a
  2  where a.rowid > (select min(b.rowid)
  3                   from invoice b
  4                   where b.invoice_id = a.invoice_id
  5                     and b.fiscal_quarter = a.fiscal_quarter
  6                     and b.invoice_type = a.invoice_type
  7                     and b.data_run_date = a.data_run_date
  8                  );

1 row deleted.

Result:

SQL> select * From invoice order by invoice_id, fiscal_quarter desc;

DATA_RUN_D FI INVOICE_ID INVOICE_
---------- -- ---------- --------
2022-07-01 Q4      12345 Paid
2022-01-01 Q1      12345 Not Paid
2022-07-01 Q4      12678 Paid
2022-01-01 Q1      12678 Not Paid

SQL>

CodePudding user response：

You can use:

DELETE FROM db.invoice
WHERE ROWID IN (
  SELECT ROWID
  FROM   (
    SELECT ROW_NUMBER() OVER (
             PARTITION BY data_run_date, fiscal_quarter, invoice_id, invoice_type
             ORDER BY NULL
           ) AS rn
     FROM  db.invoice
  )
  WHERE rn > 1
)

Which deletes the duplicate row.

db<>fiddle here