How to Update/insert records based on WHERE clause in Oracle-CodePudding

I'm looking for query to upsert(update/insert) into Oracle table from pandas dataframe. After searching I got this merge statement which updates the values when the key is present else it will insert.

merge into table_A using dual on ( key_column = '123' )
when matched then 
update set date_column = TO_TIMESTAMP('2021-01-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'), OR = 'MZM'
when NOT matched then 
insert (key_column, date_column, OR) values( '456',TO_TIMESTAMP('2021-04-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')

Solution I need

for update

I want to update record based on the condition max(date_column) for that key which should be less than or equal to 30 days from the record date_column I'm inserting.

for insert

I want to insert a new row for a key which is already present but the last available record .i.e date_column for that key should not be within 30 days.

What I tried

merge into table_A using dual on ( key_column = '123' )
when matched then 
update set date_column = TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'), ORA = 'MZM'
where (select trunc(TO_TIMESTAMP('2021-01-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3')) - trunc(max(date_column))days from table_A where key = '123') <= 30 
when NOT matched then 
insert (key_column, date_column, ORA) values( '123',TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')
where (select trunc(TO_TIMESTAMP('2022-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3')) - trunc(max(date_column))days from table_A where key = '123') > 30  ;

But it is not working :(.

I cannot do this from pandas because the data size is huge.

Table recreation script

CREATE TABLE "FDSMLDBUSER"."FDS_upsert" ("key" VARCHAR2(255 BYTE) NOT NULL ENABLE, "date_column" TIMESTAMP (6) NOT NULL ENABLE, "ORA" VARCHAR2(100 BYTE))


insert into "FDSMLDBUSER"."FDS_upsert" ("key", "date_column", ora) values ('123',TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')

Can I do this using upsert keyword in oracle where the statement updates when WHERE clause is True or it will insert ?

I appreciate your help so much.

General Question:

Can there be more than one row for a key in the past 30 days in the table? No, we should only have one record per key for past 30 days and that should be the latest one. if any key is older than 30 days then we need to insert new row for that key.

Can there be future dates in the table? Fortunately No.

I use executemany method of cx_oracle to insert records to database where I convert pandas dataframe to dictionary to enable column name inside it. Please find the code below:

connection = cx_Oracle.connect(f'{db_user}/{db_pwd}@{db_host}:{dbport}/{db_service_name}',encoding='UTF-16', nencoding='UTF-16')
cursor = connection.cursor()
parameters = secondary.to_dict(orient='records')
query = config.get('FDS_APAMA_RAWDATA', 'fds_raw_data')
cursor.prepare(query)
cursor.executemany(None,parameters)
for row in cursor:
    print(row[0])
con.commit()

CodePudding user response：

You can LEFT OUTER JOIN the new data to the existing data and find whether a row exists within 30 days and then, if it does, use the ROW_NUMBER analytic function to find the latest matching row and correlate the update using the ROWID pseudo-column:

MERGE INTO table_A dst
USING (
  SELECT d.*,
         a.ROWID AS rid,
         ROW_NUMBER() OVER (ORDER BY a.date_column DESC NULLS LAST) AS rn
  FROM   ( SELECT '123' AS key_column, 
                  TIMESTAMP '2023-01-05 00:00:11.644' AS date_column,
                  'MZM' AS ora
           FROM   DUAL ) d
         LEFT OUTER JOIN table_A a
         ON (    a.key_column = d.key_column
             AND a.date_column BETWEEN d.date_column - INTERVAL '30' DAY
                                   AND d.date_column   INTERVAL '30' DAY
         )
) src
ON ( src.rid = dst.ROWID AND src.rn = 1)
WHEN MATCHED THEN
UPDATE
  SET date_column = src.date_column,
      ora         = src.ora
WHEN NOT MATCHED THEN
  INSERT (key_column, date_column, ora)
  VALUES (src.key_column, src.date_column, src.ora);

Which, for the sample data:

CREATE TABLE table_a (
  key_column VARCHAR2(255 BYTE) NOT NULL ENABLE,
  date_column TIMESTAMP (6) NOT NULL ENABLE,
  ORA VARCHAR2(100 BYTE)
);

INSERT INTO table_a (key_column, date_column, ora)
values ('123', TIMESTAMP '2022-12-31 00:00:11.644', 'MZM');

The the row updates to:

KEY_COLUMN DATE_COLUMN ORA

123 2023-01-05 00:00:11.644000 MZM

KEY_COLUMN	DATE_COLUMN	ORA
123	2023-01-05 00:00:11.644000	MZM

db<>fiddle here

CodePudding user response：

If for the key a row in the last 30 days exists, you want to update this row, otherwise you want to insert a new row.

on ( key_column = '123' )

does not suffice to find that row. So, look up the table for a row macthing the key and the date range instead. E.g.:

merge into table_a using (select 123 as key, date '2022-01-05' as new_date from dual) src on (table_a.key_column = src.key and date_column >= src.new_date - interval '30' day) when matched then update set date_column = src.new_date, ora = 'MZM' when not matched then insert (key_column, date_column, ora) values (src.key, src.new_date, 'MZM'); ;

UPDATE

As it shows, Oracle forbids columns in the ON clause of a MERGE statement to be updated. I consider this a flaw. But well, at least they raise an error.

A simple way to go about this is to use update and insert in a PL/SQL block instead of a merge statement: Update the row if it exists. Insert otherwise (when the update affected zero rows).

begin
  update table_a
  set date_column = :new_date, ora = 'MZM'
  where key = :key and date_column >= :new_date - interval '30' day;
  
  if sql%rowcount = 0 then
    insert into table_a (key_column, date_column, ora) values (:key, :new_date, 'MZM');
  end if;
end;