What's the best way to optimize a query of search 1000 rows in 50 million by date? Oracle-CodePudding

I have table

CREATE TABLE ard_signals
(id, val, str, date_val, dt);

This table records the values of all ID's signals. Unique ID's around 950. At the moment, the table contains about 50 million rows with different values of these signals. Each ID can have only a numeric values, string values or date values.

I get the last value of each ID, which, by condition, is less than input date:

select ID,
       max(val) keep (dense_rank last order by dt desc) as val,
       max(str) keep (dense_rank last order by dt desc) as str,
       max(date_val) keep (dense_rank lastt order by dt desc) as date_val,
       max(dt)
 where dt <= to_date(any_date)
 group by id;

I have index on ID. At the moment, the request takes about 30 seconds. Help, please, what ways of optimization it is possible to make for the given request?

EXPLAIN PLAN: with dt index

Example Data(This kind of rows are about 950-1000):

ID	VAL	STR	DT
920	0		20.07.2022 9:59:11
490		yes	20.07.2022 9:40:01
565	233		20.07.2022 9:32:03
32	1		20.07.2022 9:50:01

CodePudding user response：

TL;DR You need your application to maintain a table of distinct id values.

So, you want the last record for each group (distinct id value) in your table, without doing a full table scan. It seems like it should be easy to tell Oracle: iterate through the distinct values for id and then do an index lookup to get the last dt value for each id and then give me that row. But looks are deceiving here -- it's not easy at all, if it is even possible.

Think about what an index on (id) (or (id, dt)) has. It has a bunch of leaf blocks and a structure to find the highest value of id in each block. But Oracle can only use the index to get all the distinct id values by reading every leaf block in the index. So, we might be find a way to trade our TABLE FULL SCAN for an INDEX_FFS for a marginal benefit, but it's not really what we're hoping for.

What about partitioning? Can't we create ARD_SIGNALS with PARTITION BY LIST (id) AUTOMATIC and use that? Now the data dictionary is guaranteed to have a separate partition for each distinct id value.

But again - think about what Oracle knows (from DBA_TAB_PARTITIONS) -- it knows what the highest partition key value is in each partition. It is true: for a list partitioned table, that highest value is guaranteed to be the only distinct value in the partition. But I think using that guarantee requires special optimizations that Oracle's CBO does not seem to make (yet).

So, unfortunately, you are going to need to modify your application to keep a parent table for ARDS_SIGNALS that has a (single) row for each distinct id.

Even then, it's kind of difficult to get what we want. Because, again, want Oracle to iterate through the distinct id values, then use the index to find the one with the highest dt for that id .. and then stop. So, we're looking for an execution plan that makes use of the INDEX RANGE SCAN (MIN/MAX) operation.

I find that tricky with joins, but not so hard with scalar subqueries. So, assuming we named our parent table ARD_IDS, we can start with this:

SELECT i.id, 
       ( SELECT max(dt) 
         FROM ard_signals s 
         WHERE s.id = i.id 
         AND s.dt <= to_date(trunc(SYSDATE)   2  10/86400) -- replace with your date variable
         AND rownum <= 1 ) max_dt
FROM   ard_ids i;

----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |      1 |        |   1000 |00:00:00.01 |      22 |
|   1 |  SORT AGGREGATE               |                |   1000 |      1 |   1000 |00:00:00.02 |    4000 |
|*  2 |   COUNT STOPKEY               |                |   1000 |        |   1000 |00:00:00.02 |    4000 |
|   3 |    FIRST ROW                  |                |   1000 |      1 |   1000 |00:00:00.01 |    4000 |
|*  4 |     INDEX RANGE SCAN (MIN/MAX)| ARD_SIGNALS_N1 |   1000 |      1 |   1000 |00:00:00.01 |    4000 |
|   5 |  TABLE ACCESS FULL            | ARD_IDS        |      1 |     50 |   1000 |00:00:00.01 |      22 |
----------------------------------------------------------------------------------------------------------
 
Outline Data
-------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(ROWNUM<=1)
   4 - access("S"."ID"=:B1 AND "S"."DT"<=TO_DATE(TO_CHAR(TRUNC(SYSDATE@!) 2 .000115740740740740740
              740740740740740740741)))

NOTE: the rownum <= 1 condition it to keep Oracle from merging the scalar subquery into the rest of the query and losing our desired access path.

Then, it is a matter of using those (id, max(dt)) combinations to look up the rows from the table to get the other column values. I came up with this; improvements may be possible (especially if (id, dt) is not as selective as I am assuming it is):

with k AS (
select i.id, ( SELECT max(dt) FROM ard_signals s WHERE s.id = i.id AND s.dt <= to_date(trunc(SYSDATE)   2  10/86400) AND rownum <= 1 ) max_dt
from   ard_ids i
) 
SELECT k.id,
       max(val) keep ( dense_rank first order by dt desc, s.rowid ) val,
       max(str) keep ( dense_rank first order by dt desc, s.rowid ) str,
       max(date_val) keep ( dense_rank first order by dt desc, s.rowid ) date_val,
       max(dt) keep ( dense_rank first order by dt desc, s.rowid ) dt
FROM   k 
INNER JOIN ard_signals s ON s.id = k.id AND s.dt = k.max_dt
GROUP BY k.id;

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                |      1 |        |   1000 |00:00:00.05 |    8007 |       |       |          |
|   1 |  SORT GROUP BY                       |                |      1 |   1000 |   1000 |00:00:00.05 |    8007 |   302K|   302K|  268K (0)|
|   2 |   NESTED LOOPS                       |                |      1 |   1005 |   1000 |00:00:00.04 |    8007 |       |       |          |
|   3 |    NESTED LOOPS                      |                |      1 |   1005 |   1000 |00:00:00.03 |    7007 |       |       |          |
|   4 |     TABLE ACCESS FULL                | ARD_IDS        |      1 |   1000 |   1000 |00:00:00.01 |       3 |       |       |          |
|*  5 |     INDEX RANGE SCAN                 | ARD_SIGNALS_N1 |   1000 |      1 |   1000 |00:00:00.03 |    7004 |       |       |        |
|   6 |      SORT AGGREGATE                  |                |   1000 |      1 |   1000 |00:00:00.02 |    4000 |       |       |        |
|*  7 |       COUNT STOPKEY                  |                |   1000 |        |   1000 |00:00:00.02 |    4000 |       |       |        |
|   8 |        FIRST ROW                     |                |   1000 |      1 |   1000 |00:00:00.01 |    4000 |       |       |        |
|*  9 |         INDEX RANGE SCAN (MIN/MAX)   | ARD_SIGNALS_N1 |   1000 |      1 |   1000 |00:00:00.01 |    4000 |       |       |        |
|  10 |    TABLE ACCESS BY GLOBAL INDEX ROWID| ARD_SIGNALS    |   1000 |      1 |   1000 |00:00:00.01 |    1000 |       |       |        |
--------------------------------------------------------------------------------------------------------------------------------------------

... 8000 gets and 5/100ths of a second.