I have a query that it's been written a while ago. Basically a Materialized View that uses json_table function.
Recently since we moved to Oracle 19c that MV sometimes works and other times doesn't. I rewrite that query by using oracle json_value function. Looking at the query plan, I see that the query that is using json_table is much slower but I don't understand all that data.
Can someone explain what means the bytes, CPU, time etc.
This is using json_value
EXPLAIN PLAN for
SELECT
JSON_VALUE(response, '$.ErrorRecord[0].xNumber') xNumber,
JSON_VALUE(response, '$.ErrorRecord[0]."error field"') ERROR_FIELD,
JSON_VALUE(response, '$.ErrorRecord[0]."value of field in error"') VALUE_OF_FIELD_IN_ERROR,
JSON_VALUE(response, '$.ErrorRecord[0]."error description"') ERROR_DESCRIPTION,
JSON_VALUE(request, '$.Status') STATUS,
sf.sv_code CENTER,
TO_CHAR(arr.created_date_time, 'YYYYMMDD' ) DATE_OCCURANCE
FROM
aud_request_response arr ,
person p,
rep_mapper_svc_fco sf,
rep_mapper_interface_error re
WHERE
JSON_VALUE(response, '$.ErrorRecord[0].xNumber') = p.registration_number ( )
AND arr.response.Status = 'Error'
AND arr.request.interfaceName = 'CLAIMS'
AND JSON_VALUE(request, '$.DataRecord[0].ACO') = sf.fco_code( )
AND arr.request.interfaceName = re.interface_name
AND coalesce(sf.svc_code,'ATH')
IN ('XS','YS','XZ','ZS','ASD')
GROUP BY
sf.sv_code,
JSON_VALUE(request, '$.DataRecord[0].ACO'),
arr.request.interfaceName,
JSON_VALUE(request, '$.Status'),
JSON_VALUE(response, '$.ErrorRecord[0]."error field"'),
arr.created_date_time,
arr.updated_date_time,
JSON_VALUE(response, '$.ErrorRecord[0]."value of field in error"'),
JSON_VALUE(response, '$.ErrorRecord[0]."error description"'),
JSON_VALUE(response, '$.ErrorRecord[0].xNumber') ;
SELECT * FROM table(DBMS_XPLAN.DISPLAY);
Plan hash value: 241534218
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1094 | 871K| | 1877K (1)| 00:01:14 |
| 1 | HASH GROUP BY | | 1094 | 871K| 4688K| 1877K (1)| 00:01:14 |
| 2 | NESTED LOOPS OUTER | | 5259 | 4190K| | 1877K (1)| 00:01:14 |
|* 3 | FILTER | | | | | | |
|* 4 | HASH JOIN RIGHT OUTER| | 5259 | 4139K| | 1866K (1)| 00:01:13 |
| 5 | TABLE ACCESS FULL | REP_MAPPER_SVC_FCO | 85 | 680 | | 3 (0)| 00:00:01 |
|* 6 | HASH JOIN RIGHT SEMI| | 5259 | 4098K| | 1866K (1)| 00:01:13 |
|* 7 | TABLE ACCESS FULL | REP_MAPPER_INTERFACE_ERROR | 33 | 363 | | 5 (0)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | AUD_REQUEST_RESPONSE | 5259 | 4041K| | 1866K (1)| 00:01:13 |
|* 9 | INDEX UNIQUE SCAN | PER_ANBR_IDX | 1 | 10 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------
By using json_table
EXPLAIN PLAN for
SELECT
jtresponse.xNumber as xNumber,
jtresponse.error_field as ERROR_FIELD,
replace(jtresponse.value_of_field_in_error, ',interfaceName=INTERFACES','') as VALUE_OF_FIELD_IN_ERROR,
jtresponse.error_description as ERROR_DESCRIPTION,
trim(arr.response.Status) as STATUS,
sf.sv_code as CENTER,
TO_CHAR(arr.created_date_time, 'YYYYMMDD' ) as DATE_OCCURANCE
from
aud_request_response arr,
person p,
rep_mapper_svc_fco sf,
rep_mapper_interface_error re,
json_table(response, '$'
COLUMNS (
nested path '$.ErrorRecord[*]' columns (
aNumber path '$.xNumber' null on error,
error_field path '$."error field"' null on error,
value_of_field_in_error path '$."value of field in error"' null on error,
error_description path '$."error description"' null on error
))) jtresponse
,json_table(request, '$'
COLUMNS (
nested path '$.DataRecord[*]' columns (
fileControl path '$.ACO' null on error
))) jtrequest
where jtrequest.fileControl =sf.fco_code( )
and arr.request.interfaceName = 'CLAIMS'
and arr.request.interfaceName = re.interface_name
and jtresponse.xNumber = p.registration_number ( )
and arr.response.Status='Error'
and coalesce(sf.sv_code,'ATH') in('XS','YS','XZ','ZS','ASD')
GROUP BY
sv_code,
jtrequest.fileControl,
arr.request.interfaceName,
arr.response.Status,
jtresponse.error_field,
arr.created_date_time,
arr.updated_date_time,
jtresponse.value_of_field_in_error,
jtresponse.error_description,
jtresponse.xNumber;
SELECT * FROM table(DBMS_XPLAN.DISPLAY);
Plan hash value: 834586449
--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 350G| 260T| | 1908M (2)| 20:42:27 |
| 1 | HASH GROUP BY | | 350G| 260T| 290T| 1908M (2)| 20:42:27 |
| 2 | NESTED LOOPS | | 350G| 260T| | 1168M (1)| 12:40:35 |
|* 3 | FILTER | | | | | | |
|* 4 | HASH JOIN RIGHT OUTER | | 42M| 32G| | 2009K (1)| 00:01:19 |
| 5 | TABLE ACCESS FULL | REP_MAPPER_SVC_FCO | 85 | 680 | | 3 (0)| 00:00:01 |
| 6 | NESTED LOOPS | | 42M| 32G| | 2009K (1)| 00:01:19 |
|* 7 | HASH JOIN RIGHT SEMI | | 5259 | 4098K| | 1866K (1)| 00:01:13 |
|* 8 | TABLE ACCESS FULL | REP_MAPPER_INTERFACE_ERROR | 33 | 363 | | 5 (0)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | AUD_REQUEST_RESPONSE | 5259 | 4041K| | 1866K (1)| 00:01:13 |
| 10 | JSONTABLE EVALUATION | | | | | | |
| 11 | JSONTABLE EVALUATION | | | | | | |
--------------------------------------------------------------------------------------------------------------------
Thank you!
CodePudding user response:
First up: the two queries are not equivalent!
The json_value
query gets the first entries in the DataRecord
and ErrorRecord
arrays. With json_table
the database generates a row for each element in the array.
I see no join between jtrequest
and jtresponse
. So the query is generating the Cartesian product of these arrays. i.e. it's creating a row for every element from the first array combined with every element from the second for each document.
The rows/bytes/time columns are all estimates. The optimizer thinks this is how many rows/size data/query duration based on the table stats.
The top line in the plan is what's (estimated) the query will return. So for json_table
, it's estimating:
- 350G => 350 billion rows
- 260T => 260 terabytes of data
- 20:42:27 => 20 hours of runtime
These figures could be wrong for many reasons, but even if they're over by a factor of 1000x you're still looking at huge amounts of data.
I think you need to figure out the purpose of the original query - in particular why it's generating the Cartesian product of the two arrays. This quickly increases the data volumes.