Home > Blockchain >  Does SQL server optimize what data needs to be read?
Does SQL server optimize what data needs to be read?

Time:04-15

I've been using BigQuery / Spark for a few years, but I'm not very familiar with SQL server.

If I have a query like

with x AS (
   SELECT * FROM bigTable1
),

y AS (SELECT * FROM bigTable2)

SELECT COUNT(1) FROM x

Will SQL Server be "clever" enough to ignore the pointless data fetching?

Note: Due to the configuration of my environment I don't have access to the query planner for troubleshooting.

CodePudding user response:

Like most of the leading professional DBMS's SQL Server has a statistical optimizer that will indeed eliminate datasources that are never used and cannot affect the results.

Note, however, that this does not apply to certain kinds of errors, so if your bigTable1 or bigTable2 do not exist (or you cannot access them), the query will throw a compile error, even though it would never actually use those tables.

CodePudding user response:

SQL Server has certainly and actually the most advanced optimizer of all professionnal RDBMS (IBM DB2, Oracle...).

Before optimizing the algebriser transform the query, that is just a "demand" (not an execution code) into a mathematical formulae known has "algebraic tree". This mathematical object is formulae of relational algebrae that supports the Relational DBMS (a mathematical theory developped by Franck Edgar Codd in the early 70's).

The very first step of optimisation is done at this step by simplifying the mathematical formulae like things you've done with polynomial expression including x and y ( as an example : 2x - 3y = 3x² - 5x 7 <=> y = (3x² - 3x 7 ) / 3).

Query example (from Chris Date "A Cure for Madness") :

With the table :

CREATE TABLE T_MAD (TYP CHAR(4), VAL VARCHAR(16));
INSERT INTO T_MAD VALUES 
('ALFA', 'ted'),('ALFA', 'chris'),('ALFA', 'michael'),
('NUM', '123'),('NUM', '4567'),('NUM', '89');

This query will fail :

SELECT * FROM T_MAD
WHERE VAL > 1000;

Because VAL is a string datatype uncompatible with a number value when comparing in WHERE.

But our table distinguish ALFA values from NUM values. By adding a restriction on the value with the TYP column like this :

SELECT * FROM T_MAD
WHERE TYP = 'NUM' AND VAL > 1000;

My query give the right result...

But all those queries too :

SELECT * FROM T_MAD
WHERE VAL > 1000 AND TYP = 'NUM'; 

SELECT * FROM
(
SELECT * FROM T_MAD WHERE TYP = 'NUM'
) AS T
WHERE VAL > 1000;

SELECT * FROM
(
SELECT * FROM T_MAD WHERE VAL > 1000
) AS T
WHERE TYP = 'NUM';

The last one is very important because the subquery is the first one that fails...

So what happen to this subquery that suddenly don't fail ?

In fact the algebriser rewrites all those query to a simplier form that conducts to a similar (not to say identical) formulae...

Just have a look on the query execution plans, which appear to be strictly equals !

NOTE that non professional DBMS like MySQL, MariaDB or PostGreSQL will fail for the last one.... An optimizer use a very huge set of IT developpers and researchers that open/free cannot mimic !

enter image description here

Second the optimizer have heuristic rules, that applies essentially at the semantic level. The execution plan is simplified when some contradictory conditions appears in the query text...

Just have a look over those two queries :

SELECT * FROM T_MAD WHERE 1 = 2;

SELECT * FROM T_MAD WHERE 1 = 1;

The first one will have no row returned while the second will have all the rows of the table returned... What does the optimizer ? The query execution plan give the answer :

enter image description here

Note that every constrainst (PK, FK, UNIQUE, CHECK) can help the optimizer to simplify the query execution plan to optimize performances !

Third the optimizer will use statistics that are histogram computed on data distribution to predicts how many rows will be manipulated in every step of the query execution plan...

enter image description here

There is much more things to say about SQL Server query optimizer, like the fct that it works reversely from all the other optimizer, and with this technics it can predict the missing indexes since 18 years that all other RDBMS cannot !

PS : sorry to use the french version of SSMS... I am working in France and helps professionnals to optimize there databases !

  • Related