I have a table that stores the response from certain API. It has 1.7 million rows. pk is a kind of UnixTime(not exactly, but smilliar). I call the API very frequently to see if the data had changed. To check if the data had changed, I have to run this command:
SELECT 1
FROM RATE
WHERE REGDATE = '$apiReponseDate' --yymmddhhmmss
If the answer is False
, that means the reponse had changed, and then I insert.
I have an INDEX on REGDATE, and I know this makes the table to do the binary search, not a full-search.
but I do know that in order to know if the data had updated, I only need to check the recent rows. To me, using WHERE for the whole table seems an inefficient way.
Is there any good way to see if the data I got from the API response is already in DB or not? I'm using Oracle, but that is not a main point because I'm thinking about searching the query's efficiency.
CodePudding user response:
You may use index_desc
hint and filter by rownum
to access the table and read the most recent row. Then compare this value with the current API response.
Example is below for (default) ascending index. If an index is created as id desc
, then you need to reverse the order of reading (specify index_asc
hint).
create table t ( id not null, val ) as select level, dbms_random.string('x', 1000) from dual connect by level < 5000 create unique index t_ix on t(id)
select /* index_desc(t (t.id)) gather_plan_statistics */ id, substr(val, 1, 10) from t where rownum = 1
ID SUBSTR(VAL,1,10) 4999 D0H3YOHB5E
select * from dbms_xplan.display_cursor( format => 'ALL ALLSTATS' )
PLAN_TABLE_OUTPUT :----------------- SQL_ID 2ym2rg02qfmk4, child number 0 ------------------------------------- select /* index_desc(t (t.id)) gather_plan_statistics */ id, substr(val, 1, 10) from t where rownum = 1 Plan hash value: 1335626365 ------------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | | 3 (100)| | 1 |00:00:00.01 | 3 | 1 | |* 1 | COUNT STOPKEY | | 1 | | | | | 1 |00:00:00.01 | 3 | 1 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1005 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 3 | 1 | | 3 | INDEX FULL SCAN DESCENDING | T_IX | 1 | 4999 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 2 | 1 | ------------------------------------------------------------------------------------------------------------------------------------------------ Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 2 - SEL$1 / "T"@"SEL$1" 3 - SEL$1 / "T"@"SEL$1" Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM=1) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000] 2 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000] 3 - "T".ROWID[ROWID,10], "ID"[NUMBER,22] Hint Report (identified by operation id / Query Block Name / Object Alias): Total hints for statement: 1 --------------------------------------------------------------------------- 2 - SEL$1 / "T"@"SEL$1" - index_desc(t (t.id))
CodePudding user response:
Instead of doing a SELECT
and then INSERT
, which is two queries, then you could combine the two into a MERGE
statement:
MERGE INTO rate r
USING DUAL ignore
ON r.redgate = '$apiReponseDate'
WHEN NOT MATCHED THEN
INSERT (col1, col2, col3, redgate)
VALUES ('value1', 'value2', 'value3', '$apiNewReponseDate');
This will prevent you having to use two round trips from the middle-tier to the database and do it all in a single query.