As the title says, I'm trying to delete & insert records to an Historical Table.
The Historical tbl_b has to receive records from a tbl_a, filtered by the column val_id; but there is a limit of records per each val_id, also it must keep only the most recent ones per val_id.
TBL_A
- It has columns id_val, reg_date, flag
- it has up to date records (constantly, records are inserted on this table).
- only records with flag=1 should be inserted on TBL_B.
- records are deleted by another scheduled process.
val_id | reg_date | flag |
---|---|---|
33 | 2022-10-20 23:00:00 | 1 |
22 | 2022-10-20 22:00:01 | 0 |
22 | 2022-10-20 22:00:02 | 1 |
11 | 2022-10-20 21:00:01 | 1 |
11 | 2022-10-20 21:00:02 | 1 |
11 | 2022-10-20 21:00:03 | 1 |
TBL_B:
- It has columns id_val, reg_date
- it should store 2 records per id_val and the most recent ones (order by reg_date).
- it's partitioned monthly, this table will store 150 Million records aprox.
val_id | reg_date |
---|---|
11 | 2022-10-19 11:00:01 |
11 | 2022-10-19 11:00:02 |
22 | 2022-10-19 12:00:01 |
22 | 2022-10-19 12:00:02 |
Desired Result on TBL_B:
val_id | reg_date |
---|---|
11 | 2022-10-20 21:00:02 |
11 | 2022-10-20 21:00:03 |
22 | 2022-10-19 12:00:02 |
22 | 2022-10-20 22:00:02 |
33 | 2022-10-20 23:00:00 |
To approach this, I'm trying to do it in 2 steps:
- first delete records from TBL_B, if it's necessary.
- then insert records from TBL_A, since they are always the most recent ones.
But at this moment I'm stuck trying to filter the records that should be deleted from TBL_B.
Link: http://sqlfiddle.com/#!4/73271c/1
DELETE FROM tbl_b where rowid in (
SELECT
rowid
FROM (SELECT
m.*,
ROW_NUMBER() OVER (
PARTITION BY id_val
ORDER BY
reg_date DESC
) AS rownumb
FROM (SELECT
h.*
FROM tbl_b h
LEFT JOIN (SELECT
*
FROM (SELECT
tbl_a.*,
ROW_NUMBER() OVER (
PARTITION BY id_val
ORDER BY
reg_date DESC
) AS seqnum
FROM tbl_a
WHERE flag = 1) f
WHERE f.seqnum <= 2) t
ON t.id_val = h.id_val) m) n
WHERE n.rownumb > 2
);
Any help is appreciated.
CodePudding user response:
It looks like you have to use analytic functions DENSE_RANK()
and ROW_NUMBER()
in your SQL query in order to resolve the task:
DELETE FROM tbl_b
WHERE reg_date in (SELECT reg_date
FROM (SELECT t.*
, dense_rank() over (order by trunc(reg_date) desc) as d_rank
, row_number() over (PARTITION BY TRUNC(REG_DATE) order by reg_date DESC) as r_num
FROM tbl_b t)
WHERE d_rank = 2 and r_num <> 1);
Note: DELETE
has to run after insert into the records from TBL_A (the most recent ones)
It would be good idea - CREATE TRIGGER AFTER INSERT ON tbl_b
to perform DELETE
CodePudding user response:
Actually, I'd rather do it just the opposite of what you wanted: I'd insert rows first, and delete superfluous rows next. Why? Because - if you're deleting rows first, you have to calculate how many rows to leave in TBL_B
so that you'd insert the right number of rows from TBL_A
so that TBL_B
always contains up to 2 most recent rows. That's just too complex.
Also, indexes you currently have on these tables are kind of wrong; I'd index id_val
and reg_date
in both tables; indexing flag
(at the first sight, unless there are many flags, not just e.g. 0 and 1) won't help much.
OK, here's my suggestion: initial TBL_B
table contents:
SQL> select * from tbl_b order by id_val, reg_date;
ID_VAL REG_DATE
------ -------------------
11 2022-10-19 11:00:01
11 2022-10-19 11:00:02
22 2022-10-19 12:00:01
22 2022-10-19 12:00:02
SQL>
Insert first: fetch two most recent rows per id_val
- that's the maximum rows you'd have if there are no rows in TBL_B
at all:
SQL> insert into tbl_b (id_val, reg_date)
2 with temp as
3 (select id_val,
4 reg_date,
5 rank() over (partition by id_val order by reg_date desc) rnk
6 from tbl_a
7 where flag = 1
8 )
9 select t.id_val, t.reg_date
10 from temp t
11 where t.rnk <= 2;
4 rows created.
Delete superfluous rows:
SQL> delete from tbl_b a
2 where exists
3 (select null
4 from (select b.id_val,
5 b.reg_date,
6 rank() over (partition by b.id_val order by b.reg_date desc) rnk
7 from tbl_b b
8 ) x
9 where x.id_val = a.id_val
10 and x.reg_date = a.reg_date
11 and x.rnk > 2
12 );
3 rows deleted.
The final result:
SQL> select * from tbl_b order by id_val, reg_date;
ID_VAL REG_DATE
------ -------------------
11 2022-10-20 11:00:02
11 2022-10-20 11:00:03
22 2022-10-19 12:00:02
22 2022-10-20 22:00:02
33 2022-10-20 23:00:00
SQL>