oracle json_value vs. json_table explain query plan-CodePudding

I have a query that it's been written a while ago. Basically a Materialized View that uses json_table function.

Recently since we moved to Oracle 19c that MV sometimes works and other times doesn't. I rewrite that query by using oracle json_value function. Looking at the query plan, I see that the query that is using json_table is much slower but I don't understand all that data.

Can someone explain what means the bytes, CPU, time etc.

This is using json_value

EXPLAIN PLAN for
SELECT
    JSON_VALUE(response, '$.ErrorRecord[0].xNumber')                   xNumber,
    JSON_VALUE(response, '$.ErrorRecord[0]."error field"')             ERROR_FIELD,
    JSON_VALUE(response, '$.ErrorRecord[0]."value of field in error"') VALUE_OF_FIELD_IN_ERROR,
    JSON_VALUE(response, '$.ErrorRecord[0]."error description"')       ERROR_DESCRIPTION,
    JSON_VALUE(request, '$.Status')                                    STATUS,
    sf.sv_code                                                        CENTER,
    TO_CHAR(arr.created_date_time, 'YYYYMMDD' )                        DATE_OCCURANCE
    
FROM
    aud_request_response arr , 
    person p,
    rep_mapper_svc_fco sf,
    rep_mapper_interface_error re
WHERE 
    JSON_VALUE(response, '$.ErrorRecord[0].xNumber') = p.registration_number ( )
    AND arr.response.Status = 'Error'
    AND arr.request.interfaceName = 'CLAIMS'
    AND JSON_VALUE(request, '$.DataRecord[0].ACO') = sf.fco_code( )
    AND arr.request.interfaceName = re.interface_name 
    AND coalesce(sf.svc_code,'ATH')
        IN ('XS','YS','XZ','ZS','ASD')
GROUP BY 
    sf.sv_code,
    JSON_VALUE(request, '$.DataRecord[0].ACO'),
    arr.request.interfaceName,
    JSON_VALUE(request, '$.Status'), 
    JSON_VALUE(response, '$.ErrorRecord[0]."error field"'),
    arr.created_date_time,
    arr.updated_date_time,
    JSON_VALUE(response, '$.ErrorRecord[0]."value of field in error"'),
    JSON_VALUE(response, '$.ErrorRecord[0]."error description"'),
    JSON_VALUE(response, '$.ErrorRecord[0].xNumber') ;

SELECT * FROM table(DBMS_XPLAN.DISPLAY);

Plan hash value: 241534218
 
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name                          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                               |  1094 |   871K|       |  1877K  (1)| 00:01:14 |
|   1 |  HASH GROUP BY           |                               |  1094 |   871K|  4688K|  1877K  (1)| 00:01:14 |
|   2 |   NESTED LOOPS OUTER     |                               |  5259 |  4190K|       |  1877K  (1)| 00:01:14 |
|*  3 |    FILTER                |                               |       |       |       |            |          |
|*  4 |     HASH JOIN RIGHT OUTER|                               |  5259 |  4139K|       |  1866K  (1)| 00:01:13 |
|   5 |      TABLE ACCESS FULL   | REP_MAPPER_SVC_FCO         |    85 |   680 |       |     3   (0)| 00:00:01 |
|*  6 |      HASH JOIN RIGHT SEMI|                               |  5259 |  4098K|       |  1866K  (1)| 00:01:13 |
|*  7 |       TABLE ACCESS FULL  | REP_MAPPER_INTERFACE_ERROR |    33 |   363 |       |     5   (0)| 00:00:01 |
|*  8 |       TABLE ACCESS FULL  | AUD_REQUEST_RESPONSE        |  5259 |  4041K|       |  1866K  (1)| 00:01:13 |
|*  9 |    INDEX UNIQUE SCAN     | PER_ANBR_IDX               |     1 |    10 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

By using json_table

EXPLAIN PLAN for
SELECT
    jtresponse.xNumber as xNumber,
    jtresponse.error_field as ERROR_FIELD,
    replace(jtresponse.value_of_field_in_error, ',interfaceName=INTERFACES','') as VALUE_OF_FIELD_IN_ERROR,
    jtresponse.error_description as ERROR_DESCRIPTION,
    trim(arr.response.Status) as STATUS,
    sf.sv_code  as CENTER,
    TO_CHAR(arr.created_date_time, 'YYYYMMDD' ) as DATE_OCCURANCE
from 
    aud_request_response arr,
    person p,
    rep_mapper_svc_fco sf,
    rep_mapper_interface_error re,
    json_table(response, '$'
        COLUMNS (
        nested path '$.ErrorRecord[*]' columns (
            aNumber path '$.xNumber' null on error,
            error_field path '$."error field"' null on error,
            value_of_field_in_error path '$."value of field in error"' null on error,
            error_description path '$."error description"' null on error
    ))) jtresponse
    ,json_table(request, '$'
        COLUMNS (
        nested path '$.DataRecord[*]' columns (
            fileControl path '$.ACO' null on error
    ))) jtrequest
where  jtrequest.fileControl =sf.fco_code( )
    and arr.request.interfaceName = 'CLAIMS'
    and arr.request.interfaceName = re.interface_name
    and jtresponse.xNumber = p.registration_number ( )
    and arr.response.Status='Error'
    and coalesce(sf.sv_code,'ATH') in('XS','YS','XZ','ZS','ASD')
GROUP BY 
    sv_code,
    jtrequest.fileControl,
    arr.request.interfaceName,
    arr.response.Status, 
    jtresponse.error_field,
    arr.created_date_time,
    arr.updated_date_time,
    jtresponse.value_of_field_in_error,
    jtresponse.error_description,
    jtresponse.xNumber;

SELECT * FROM table(DBMS_XPLAN.DISPLAY);

Plan hash value: 834586449
 
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                               |   350G|   260T|       |  1908M  (2)| 20:42:27 |
|   1 |  HASH GROUP BY             |                               |   350G|   260T|   290T|  1908M  (2)| 20:42:27 |
|   2 |   NESTED LOOPS             |                               |   350G|   260T|       |  1168M  (1)| 12:40:35 |
|*  3 |    FILTER                  |                               |       |       |       |            |          |
|*  4 |     HASH JOIN RIGHT OUTER  |                               |    42M|    32G|       |  2009K  (1)| 00:01:19 |
|   5 |      TABLE ACCESS FULL     | REP_MAPPER_SVC_FCO         |    85 |   680 |       |     3   (0)| 00:00:01 |
|   6 |      NESTED LOOPS          |                               |    42M|    32G|       |  2009K  (1)| 00:01:19 |
|*  7 |       HASH JOIN RIGHT SEMI |                               |  5259 |  4098K|       |  1866K  (1)| 00:01:13 |
|*  8 |        TABLE ACCESS FULL   | REP_MAPPER_INTERFACE_ERROR |    33 |   363 |       |     5   (0)| 00:00:01 |
|*  9 |        TABLE ACCESS FULL   | AUD_REQUEST_RESPONSE        |  5259 |  4041K|       |  1866K  (1)| 00:01:13 |
|  10 |       JSONTABLE EVALUATION |                               |       |       |       |            |          |
|  11 |    JSONTABLE EVALUATION    |                               |       |       |       |            |          |
--------------------------------------------------------------------------------------------------------------------

Thank you!

CodePudding user response：

First up: the two queries are not equivalent!

The json_value query gets the first entries in the DataRecord and ErrorRecord arrays. With json_table the database generates a row for each element in the array.

I see no join between jtrequest and jtresponse. So the query is generating the Cartesian product of these arrays. i.e. it's creating a row for every element from the first array combined with every element from the second for each document.

The rows/bytes/time columns are all estimates. The optimizer thinks this is how many rows/size data/query duration based on the table stats.

The top line in the plan is what's (estimated) the query will return. So for json_table, it's estimating:

350G => 350 billion rows
260T => 260 terabytes of data
20:42:27 => 20 hours of runtime

These figures could be wrong for many reasons, but even if they're over by a factor of 1000x you're still looking at huge amounts of data.

I think you need to figure out the purpose of the original query - in particular why it's generating the Cartesian product of the two arrays. This quickly increases the data volumes.