I'm looking for query to upsert
(update/insert) into Oracle table from pandas
dataframe. After searching I got this merge
statement which updates the values when the key is present else it will insert.
merge into table_A using dual on ( key_column = '123' )
when matched then
update set date_column = TO_TIMESTAMP('2021-01-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'), OR = 'MZM'
when NOT matched then
insert (key_column, date_column, OR) values( '456',TO_TIMESTAMP('2021-04-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')
Solution I need
for update
I want to update
record based on the condition max(date_column)
for that key
which should be less than or equal to 30 days
from the record date_column
I'm inserting.
for insert
I want to insert a new row for a key
which is already present but the last available record .i.e date_column
for that key should not be within 30 days.
What I tried
merge into table_A using dual on ( key_column = '123' )
when matched then
update set date_column = TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'), ORA = 'MZM'
where (select trunc(TO_TIMESTAMP('2021-01-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3')) - trunc(max(date_column))days from table_A where key = '123') <= 30
when NOT matched then
insert (key_column, date_column, ORA) values( '123',TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')
where (select trunc(TO_TIMESTAMP('2022-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3')) - trunc(max(date_column))days from table_A where key = '123') > 30 ;
But it is not working :(.
I cannot do this from pandas because the data size is huge.
Table recreation script
CREATE TABLE "FDSMLDBUSER"."FDS_upsert" ("key" VARCHAR2(255 BYTE) NOT NULL ENABLE, "date_column" TIMESTAMP (6) NOT NULL ENABLE, "ORA" VARCHAR2(100 BYTE))
insert into "FDSMLDBUSER"."FDS_upsert" ("key", "date_column", ora) values ('123',TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')
Can I do this using upsert
keyword in oracle where the statement updates when WHERE
clause is True or it will insert ?
I appreciate your help so much.
General Question:
Can there be more than one row for a key in the past 30 days in the table? No, we should only have one record per key for past 30 days and that should be the latest one. if any key is older than 30 days then we need to insert new row for that key.
Can there be future dates in the table? Fortunately No.
I use executemany
method of cx_oracle to insert records to database where I convert pandas dataframe
to dictionary
to enable column name inside it. Please find the code below:
connection = cx_Oracle.connect(f'{db_user}/{db_pwd}@{db_host}:{dbport}/{db_service_name}',encoding='UTF-16', nencoding='UTF-16')
cursor = connection.cursor()
parameters = secondary.to_dict(orient='records')
query = config.get('FDS_APAMA_RAWDATA', 'fds_raw_data')
cursor.prepare(query)
cursor.executemany(None,parameters)
for row in cursor:
print(row[0])
con.commit()
CodePudding user response:
You can LEFT OUTER JOIN
the new data to the existing data and find whether a row exists within 30 days and then, if it does, use the ROW_NUMBER
analytic function to find the latest matching row and correlate the update using the ROWID
pseudo-column:
MERGE INTO table_A dst
USING (
SELECT d.*,
a.ROWID AS rid,
ROW_NUMBER() OVER (ORDER BY a.date_column DESC NULLS LAST) AS rn
FROM ( SELECT '123' AS key_column,
TIMESTAMP '2023-01-05 00:00:11.644' AS date_column,
'MZM' AS ora
FROM DUAL ) d
LEFT OUTER JOIN table_A a
ON ( a.key_column = d.key_column
AND a.date_column BETWEEN d.date_column - INTERVAL '30' DAY
AND d.date_column INTERVAL '30' DAY
)
) src
ON ( src.rid = dst.ROWID AND src.rn = 1)
WHEN MATCHED THEN
UPDATE
SET date_column = src.date_column,
ora = src.ora
WHEN NOT MATCHED THEN
INSERT (key_column, date_column, ora)
VALUES (src.key_column, src.date_column, src.ora);
Which, for the sample data:
CREATE TABLE table_a (
key_column VARCHAR2(255 BYTE) NOT NULL ENABLE,
date_column TIMESTAMP (6) NOT NULL ENABLE,
ORA VARCHAR2(100 BYTE)
);
INSERT INTO table_a (key_column, date_column, ora)
values ('123', TIMESTAMP '2022-12-31 00:00:11.644', 'MZM');
The the row updates to:
KEY_COLUMN DATE_COLUMN ORA 123 2023-01-05 00:00:11.644000 MZM
db<>fiddle here
CodePudding user response:
If for the key a row in the last 30 days exists, you want to update this row, otherwise you want to insert a new row.
on ( key_column = '123' )
does not suffice to find that row. So, look up the table for a row macthing the key and the date range instead. E.g.:
merge into table_a
using (select 123 as key, date '2022-01-05' as new_date from dual) src
on (table_a.key_column = src.key and date_column >= src.new_date - interval '30' day)
when matched then
update set date_column = src.new_date, ora = 'MZM'
when not matched then
insert (key_column, date_column, ora) values (src.key, src.new_date, 'MZM');
;
UPDATE
As it shows, Oracle forbids columns in the ON clause of a MERGE statement to be updated. I consider this a flaw. But well, at least they raise an error.
A simple way to go about this is to use update and insert in a PL/SQL block instead of a merge statement: Update the row if it exists. Insert otherwise (when the update affected zero rows).
begin
update table_a
set date_column = :new_date, ora = 'MZM'
where key = :key and date_column >= :new_date - interval '30' day;
if sql%rowcount = 0 then
insert into table_a (key_column, date_column, ora) values (:key, :new_date, 'MZM');
end if;
end;