Query Performance of incremental "bigger than"performance vs. separate "between"-CodePudding

I have a query which looks like follow:

select         
CASE
WHEN col BETWEEN 0 AND 20 THEN 0
WHEN col BETWEEN 20 AND 50 THEN 20
WHEN col BETWEEN 50 AND 100 THEN 40
WHEN col BETWEEN 100 AND 200 THEN 75
WHEN col BETWEEN 200 AND 10000 THEN 86
ELSE 0 END AS t_col
from TAB1;

I wanted to get a better performance and rewrote it to:

select case  
WHEN col < 20 then 0
WHEN col < 50 then 20
WHEN col < 100 then 40
WHEN col < 200 then 75
WHEN col < 1000 then 86
END AS t_col
from TAB1;

I tought the second query might be faster because not an interval would be created but only the value of column would be compared to one number. The Explain Plan gives me identical results for both queries. i would like to know which one of them do a better performance?

CodePudding user response：

Well, let me do a PoC and see what happens

SQL> create table t1 ( c1 number, c2 varchar2(40) , c3 varchar2(40) ) ;

Table created.

SQL> declare
  2  begin
  3  for i in 1 .. 1000000
  4  loop
  5   insert into t1 values ( round(dbms_random.value(1,100)) , dbms_random.string('X',40) , dbms_random.string('X',40) );
  6  end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER','T1');

PL/SQL procedure successfully completed.

SQL> select count(*) from t1 ;

  COUNT(*)
----------
   1000000

Scenario

SQL> set autotrace traceonly
SQL> select
CASE
WHEN c1 BETWEEN 0 AND 20 THEN 0
WHEN c1 BETWEEN 20 AND 50 THEN 20
WHEN c1 BETWEEN 50 AND 100 THEN 40
WHEN c1 BETWEEN 100 AND 200 THEN 75
WHEN c1 BETWEEN 200 AND 10000 THEN 86
ELSE 0 END AS t_c1
from t1;  2    3    4    5    6    7    8    9

1000000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000K|  2929K|  2322   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T1   |  1000K|  2929K|  2322   (1)| 00:00:01 |
--------------------------------------------------------------------------

SQL> select case
WHEN c1 < 20 then 0
WHEN c1 < 50 then 20
WHEN c1 < 100 then 40
WHEN c1 < 200 then 75
WHEN c1 < 1000 then 86
END AS t_c1
from t1;  2    3    4    5    6    7    8

1000000 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000K|  2929K|  2322   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T1   |  1000K|  2929K|  2322   (1)| 00:00:01 |
--------------------------------------------------------------------------

They behave identical, right? Not exactly, although the plan looks the same, let's check the statistics. For that I will flush buffer cache and shared pool after each test.

SQL> set autotrace traceonly timing on
SQL> select
CASE
WHEN c1 BETWEEN 0 AND 20 THEN 0
WHEN c1 BETWEEN 20 AND 50 THEN 20
WHEN c1 BETWEEN 50 AND 100 THEN 40
WHEN c1 BETWEEN 100 AND 200 THEN 75
WHEN c1 BETWEEN 200 AND 10000 THEN 86
ELSE 0 END AS t_c1
from t1;  2    3    4    5    6    7    8    9

1000000 rows selected.

Elapsed: 00:00:02.92

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000K|  2929K|  2322   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T1   |  1000K|  2929K|  2322   (1)| 00:00:01 |
--------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
      72870  consistent gets
       6180  physical reads
          0  redo size
   19435128  bytes sent via SQL*Net to client
     733901  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

SQL> alter system flush shared_pool ;

System altered.

Elapsed: 00:00:00.08
SQL> alter system flush buffer_cache ;

System altered.

Elapsed: 00:00:00.04
SQL> select t1.* , case
WHEN c1 < 20 then 0
WHEN c1 < 50 then 20
WHEN c1 < 100 then 40
WHEN c1 < 200 then 75
WHEN c1 < 1000 then 86
END AS t_c1
from t1;  2    3    4    5    6    7    8

1000000 rows selected.

Elapsed: 00:00:03.49

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000K|    81M|  2323   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T1   |  1000K|    81M|  2323   (1)| 00:00:01 |
--------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         18  recursive calls
          0  db block gets
      72878  consistent gets
       6180  physical reads
          0  redo size
  101747627  bytes sent via SQL*Net to client
     733834  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

The second one is slower than the first one, probably because of the recursive calls done in the second one ( 18 ) against the first one ( 12 ). If you start putting more fields, and the data is huge, I am quite sure you will get better performance in the query with between than in the other one.

But obviously, it is just a feeling, you must test yourself in your own database. I am quite sure that the final query contains other fields, where conditions, etc... therefore this answer only covers what you put in the original question, and not what will happen in a real case scenario with many other fields, where conditions, indexes, etc...