Which metrics to compare when evaluating SQL query performance?-CodePudding

I recently watched an online course about oracle SQL performance tuning. In the video, the lecturer constantly compares the COST value from the Autotrace when comparing the performance of two queries.

But I've also read from other forums and websites where it states that COST is a relative value specific to that query and should not be used for an absolute metric for evaluating performance. They suggest looking at things like consistent gets, physical reads, etc instead.

So my interpretation is that it makes no sense to compare the COST value for completely different queries that are meant for different purposes because the COST value is relative. But when comparing the same 2 queries, one which has been slightly modified for "better performance", it is okay to compare the COST values. Is my interpretation accurate?

When is it okay to compare the COST value as opposed to some other metric?

What other metrics should we look at when evaluating/comparing query performance?

CodePudding user response：

As with most things in Engineering, it really comes down to why / what you are comparing and evaluating for.

COST is a general time-based estimate for Oracle that is used as the ranking metric in it's internal optimiser. This answer explains that selection process pretty well.

In general, COST as a metric is a good way to compare the expected computation time of two different queries, since it measures the estimated time cost of the query expressed as # of block reads. So, if you are comparing the performance of the same query, one optimised for time, then COST is a good metric to use.

However, if your query or system is bottle-necked or constraint on something other than time (e.g. memory efficiency), then COST is will be a poor metric to optimise against. In those cases, you should pick a metric that is relevant to your end goal.

CodePudding user response：

In general, I would be very wary about comparing the cost between two queries unless you have a very specific reason to believe that makes sense.

In general, people don't look at the 99.9% of queries that the optimizer produces a (nearly) optimal plan for. People look at queries where the optimizer has produced a decidedly sub-optimal plan. The optimizer will produce a sub-optimal plan for one of two basic reasons-- either it can't transform a query into a form it can optimize (in which case a human likely needs to rewrite the query) or the statistics it is using to make its estimates are incorrect so what it thinks is an optimal plan is not. (Of course, there are other reasons queries might be slow-- perhaps the optimizer produced an optimal plan but the optimal plan is doing a table scan because an index is missing for example.)

If I'm looking at a query that is slow and the query seems to be reasonably well-written and a reasonable set of indexes are available, statistics are the most likely source of problems. Since cost is based entirely on statistics, however, that means that the optimizer's cost estimates are incorrect. If they are incorrect, the cost is roughly equally likely to be incorrectly high or incorrectly low. If I look at the query plan for a query that I know needs to aggregate hundreds of thousands of rows to produce a report and I see that the optimizer has assigned it a single-digit cost, I know that somewhere along the line it is estimating that a step will return far too few rows. In order to tune that query, I'm going to need the cost to go up so that the optimizer's estimates accurately reflect reality. If I look at the query plan for a query I know should only need to scan a handful of rows and I see a cost in the tens of thousands, I know that the optimizer is estimating that some step will return far too many rows. In order to tune that query, I'm going to need the cost to go down so that the optimizer's estimates reflect reality.

If you use the gather_plan_statistics hint, you'll see the estimated and actual row counts in your query plan. If the optimizer's estimates are close to reality, the plan is likely to be pretty good and cost is likely to be reasonably accurate. If the optimizer's estimates are off, the plan is likely to be poor and the cost is likely to be wrong. Trying to use a cost metric to tune a query without first confirming that the cost is reasonably close to reality is seldom very productive.

Personally, I would ignore cost and focus on metrics that are likely to be stable over time and that are actually correlated with performance. My bias would be to focus on logical reads since most systems are I/O bound but you could use CPU time or elapsed time as well (elapsed time, though, tends not to be particularly stable because it depends on what happens to be in cache at the time the query is run). If you're looking at a plan, focus on the estimated vs. actual row counts not on the cost.

CodePudding user response：

The actual run time of a query is by far the most important metric for tuning queries. We can ignore cost and other metrics 99.9% of the time.

If the query is relatively small and fast, and we can easily re-run it and find the actual run times with the GATHER_PLAN_STATISTICS hint:

-- Add a hint to the query and re-run it.
select /*  gather_plan_statistics */ count(*) from all_objects;

-- Find the SQL_ID of your query.
select sql_id, sql_fulltext from gv$sql where lower(sql_text) like '%gather_plan_statistics%';

-- Plus in the SQL_ID to find an execution plan with actual numbers.
select * from table(dbms_xplan.display_cursor(sql_id => 'bbqup7krbyf61', format => 'ALLSTATS LAST'));

If the query was very slow, and we can't easily re-run it, generate a SQL Monitor report. This data is usually available for a few hours after the last execution.

-- Generate a SQL Monitor report.
select dbms_sqltune.report_sql_monitor(sql_id => 'bbqup7krbyf61') from dual;

There are whole books written about interpreting the results. The basics are you want to first examine the execution plan and focus on the operations with the largest "A-Time". If you want to understand where the query or optimizer went bad, compare the "E-Rows" with "A-Rows", since the estimated cardinality drives most of the optimizer decisions.

Example output:

SQL_ID  bbqup7krbyf61, child number 0
-------------------------------------
select /*  gather_plan_statistics */ count(*) from all_objects
 
Plan hash value: 3058112905
 
--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                    |      1 |        |      1 |00:00:03.58 |     121K|    622 |       |       |          |
|   1 |  SORT AGGREGATE                           |                    |      1 |      1 |      1 |00:00:03.58 |     121K|    622 |       |       |          |
|*  2 |   FILTER                                  |                    |      1 |        |  79451 |00:00:02.10 |     121K|    622 |       |       |          |
|*  3 |    HASH JOIN                              |                    |      1 |  85666 |  85668 |00:00:00.12 |    1479 |      2 |  2402K|  2402K| 1639K (0)|
|   4 |     INDEX FULL SCAN                       | I_USER2            |      1 |    148 |    148 |00:00:00.01 |       1 |      0 |       |       |          |
...