What is the best way to see if the row exsists when you know that you only need to check the recent-CodePudding

I have a table that stores the response from certain API. It has 1.7 million rows. pk is a kind of UnixTime(not exactly, but smilliar). I call the API very frequently to see if the data had changed. To check if the data had changed, I have to run this command:

SELECT 1
   FROM RATE
   WHERE REGDATE = '$apiReponseDate' --yymmddhhmmss

If the answer is False, that means the reponse had changed, and then I insert. I have an INDEX on REGDATE, and I know this makes the table to do the binary search, not a full-search.

but I do know that in order to know if the data had updated, I only need to check the recent rows. To me, using WHERE for the whole table seems an inefficient way.

Is there any good way to see if the data I got from the API response is already in DB or not? I'm using Oracle, but that is not a main point because I'm thinking about searching the query's efficiency.

CodePudding user response：

You may use index_desc hint and filter by rownum to access the table and read the most recent row. Then compare this value with the current API response.

Example is below for (default) ascending index. If an index is created as id desc, then you need to reverse the order of reading (specify index_asc hint).

create table t (
  id not null,
  val
) as
select level,
  dbms_random.string('x', 1000)
from dual
connect by level < 5000

create unique index t_ix
on t(id)

select
  /* 
    index_desc(t (t.id))
    gather_plan_statistics
  */
  id,
  substr(val, 1, 10)
from t
where rownum = 1

ID	SUBSTR(VAL,1,10)
4999	D0H3YOHB5E

select *
from dbms_xplan.display_cursor(
  format => 'ALL ALLSTATS'
)

PLAN_TABLE_OUTPUT
:-----------------
SQL_ID  2ym2rg02qfmk4, child number 0
-------------------------------------
select   /*      index_desc(t (t.id))     gather_plan_statistics   */   
id,   substr(val, 1, 10) from t where rownum = 1
 
Plan hash value: 1335626365
 
------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       3 |      1 |
|*  1 |  COUNT STOPKEY                       |      |      1 |        |       |            |          |      1 |00:00:00.01 |       3 |      1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |      1 |      1 |  1005 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |      1 |
|   3 |    INDEX FULL SCAN DESCENDING        | T_IX |      1 |   4999 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |      1 |
------------------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$1
   2 - SEL$1 / "T"@"SEL$1"
   3 - SEL$1 / "T"@"SEL$1"
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
   1 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000]
   2 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000]
   3 - "T".ROWID[ROWID,10], "ID"[NUMBER,22]
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
 
   2 -  SEL$1 / "T"@"SEL$1"
           -  index_desc(t (t.id))

fiddle

CodePudding user response：

Instead of doing a SELECT and then INSERT, which is two queries, then you could combine the two into a MERGE statement:

MERGE INTO rate r
USING DUAL ignore
ON r.redgate = '$apiReponseDate'
WHEN NOT MATCHED THEN
  INSERT (col1, col2, col3, redgate)
  VALUES ('value1', 'value2', 'value3', '$apiNewReponseDate');

This will prevent you having to use two round trips from the middle-tier to the database and do it all in a single query.