18 old routine member blockbuster share SQL optimization experience (welcome to spray)-CodePudding

A SQL optimization on to talk about the SQL disaster

Doesn't SQL early disaster: reach early effect and the disaster of the project: early in the project, the business data quantity is little, the influence of the SQL execution efficiency is not obvious, the development and operations, and business personnel cannot connect it good or bad,
The consequences of the disaster is a paralysing SQL: SQL is usually a large disaster effects, involving business items, but the function module to the entire database, this in our company's logistics warehouse, production orders, sales orders and other high timeliness of business projects might give the company dalai lama direct economic losses, will be the subject of liability,

SQL is a recipe for disaster for a rainy day: SQL disaster is equivalent to a new virus, once the functioning disease performance, for we are systems of large and complex SQL disaster Jude Yu Xinguan virus occur in must level of wuhan city, in confirmation is not easy, SQL disaster prevention, combined with the experience of the individual and team for a rainy day, doubt ask much more to learn, veterinarian,

Second, some methods of SQL optimization

1. The index of some principles:

1). The width of the index

Index width, how many bytes, namely the index key to take effect index width has two factors: one is the reference column, the second is the index of the data type, in principle, the index should remain relatively narrow, that is to say, to as little as possible, the data type of the column as far as possible concise.

Cannot be ntext, text, image, varchar (Max), nvarchar (Max) and varbinary (Max) data type columns specified for the index key columns, however, varchar (Max), nvarchar (Max), varbinary (Max) and XML data type column can be used as a key index column to participate in the clustered index, but it is not recommended to use,

Design pay attention to the following data types:

(1) vs. type date date/time type

Old version of SQL Server only date/time type datetime (1753-01-01 00:00:00. 000 by the 9999-12-31 23:59:59. 999, 8 bytes) and smalldatetime (ranging from the 1900-01-01 00:00 to 1900-01-01 23:59, 4 bytes), SQL Server 2008 new date of a variety of types, including date (range from 0001-01-01 to 9999-12-31, 3 bytes) and time (the time data type support from 0 to 7 different precision, like DATETIME2 format, its disk overhead is 3 to 5 bytes), if a column is often only need to query the date, suggest a datetime split of the actual business for the date and time type 2 columns, and on the date type of column index,

(2) the integer vs. character

Some number, such as customer ID, if there is no special business requirements, then use the integer is the best choice, because, integer range from 2147483648 to 2147483647, accounting for 4 bytes, while the same range of character type need 10 bytes, in addition, for a small range of Numbers, smallint also is right choice, which range from 32768 to 32767, accounts for only 2 bytes,

(3) the Uniqueidentifier column (GUID) vs. IDENTITY column

Some programmers want to have a unique identifier in a row, so will GUID as the identifier, if for uniqueidentifier column type specified in the table definition, the value of the column is to GUID type, accounting for 16 bytes,

The CREATE TABLE Table1 (MyID uniqueidentifier, MyName varchar (10))

Insert into Table1 values (newid (), "noname")
Select * from Table1

From the perspective of program design, the design is not wrong, but if the index on the column (especially the clustered index), may have great influence on performance, suggest to switch to IDENTITY columns,

First of all, the GUID 16 bytes; And IDENTITY as a type int, only four bytes, in contrast, the latter is the index of the width can be reduced to 12 bytes,

Second, the GUID is random, lead to the index page phenomenon is very serious; The IDENTITY column values are usually continuous growth, so as not to cause too much index page,

(4) master data gathered primary key: IDENTITY column vs. varChar (10) number column

Some programmers want to master data in number as the primary key, and then in the documents list to write the serial number of master data field records. Relative to use IDENTITY column as the primary data table's primary key table record method and write documents, records relatively seems intuitive, most programmers are willing to such processing. This may be in their past small companies or software companies to not appear. The primary key index varchar (10) number column nearly 20 times slower than the IDENTITY column, for our company, most of the time are millions must level data, so the primary key of the 20 times the efficiency loss will probably cause disastrous consequences.

(5) T_Code_info disaster: now part of the library of the table has amounted to 5000 data, and the index of efficiency is very low. This watch raises many questions, and the subsequent need to T_Code_head stored by classification and T_Code_item two master-slave table of optimization of this table.

2. Under special circumstances, the experiences of indexed:

1). Insert for frequent updates of large data table, clustered index created correctly particularly important, otherwise the subsequent data when large amount of influence is very serious. According to the experience should be the only field in constant since the increment field or order column as a clustered index/gathered primary key, can significantly improve data insert time limitation, thus table is locked time decreases, and can improve the whole involving all statements in this table query execution efficiency. The barcode in the clock has existed in the company a negative example.

Agreed: one year of data in more than 500 w record table, must take the incremental fields as aggregation index/gathered primary key, must not with multiple fields gathered composite index as a primary key/clustered index.

Note: the large data tables with incremental field as a clustered index, multiple fields gathered into the primary key as the primary key constraint, this is the great data table of the insert and update frequently the optimal solution.

Agreed: all master data table must be int/bigint type NID since the clustered index increment as the primary key, master-slave relations documents list, the main table must be int/bigint type NID since the increment as a clustered index, and NID fields from the table in the main table and associated fields.

(here a metering system according to the second write data table for table and indexed experience)

2). The primary key: is that all the indexes in the fastest except the clustered index, all the tables we all need to build a primary key of the single index/composite index (notice relating to the primary key field cannot exist NULL values).

3). The design index is an important principle in the single index don't composite index, because the single index is often more effective than a composite index.

Such as non-essential not to build composite index (except the primary key), because only in unidirectional composite indexes in order to come into force in. If you don't know or remember order, after the project launch, to late after data volume up, only to find that the composite index in your query will not take effect actually, might lead to SQL disaster.

4). Index columns do not participate in computing, as far as possible keep query indexed column "clean", for example, cast (create_time as date)='2020-03-06' will not be able to use the index, the reason is very simple, B + tree is stored in the data field values in the table, but when retrieved, need to apply all elements function to compare, obviously this price is too big, so the statement must write: create_time & gt;=cast (' 2020-03-06 00:00:00 'as a datetime) And create_time & lt; Cast (' 2020-03-07 00:00:00 as datetime)

5). When it comes to avoid again in the where clause to null value judgment of field, otherwise will lead to a full table scan engine abandon the use of index.
Out here is mainly one of his old workmates last one statement he optimization is beside the point several times, problems occurred in the subquery associated with null values, in the main query pair straight through to list the null value as a great influence on the efficiency of query conditions. In order to avoid this situation should put this in the subquery is null fields into other special values. Such as: NVL (ParentNO, 'ISNULL ParentNO) as ParentNO, in the main query ParentNO<> 'ISNULL'
6) should be avoided in the middle of the where clause to use!=or & lt;> Operator, otherwise will full table scan engine abandon the use of index, must be available, this kind of query conditions as far as possible don't use instead=(Boolean), must be available as far as possible in the where clause in the end,
And lead to a problem: here is our a lot of time in order to save trouble, put the status field is defined as a char (2), the influence of the query efficiency is also very big, this situation is common at present in our list.

Solution: change the Boolean smallint or int, and safeguard the state table shows the system configuration, and for the convenience of queries can be in the system configuration list to state group identifier in order to improve the query efficiency, reduce & lt;> Or IN the presence of the query.
4. Avoid using or in the where clause to join condition, otherwise may lead to a full table scan engine abandon the use of index, such as:
Select id from t where num=10 or num=20
Can query like this:
Select id from t where num=10
Union all
Select id from t where num=20

5. And in the not in should also be careful, otherwise it will lead to a full table scan, such as:
Select the id from t where num in (1, 2, 3)
For the value of continuous, it can be used between don't use in:
Select the id from t where num is between 1 and 3

6. The following query will also lead to a full table scan:
Select the id from t where name like '% ABC %'

7. Should avoid as far as possible in the where clause for calculation expression field operation, this will lead to a full table scan engine abandon the use of index, such as:
Select the id from t where num/2=100
Should be changed to:
Select id from t where num=100 * 2

8. Should avoid as far as possible in the where clause function was carried out on the field operation, this will lead to a full table scan engine abandon the use of index, such as:
Select id from t the where the substring (name, 1, 3)='ABC' - the name begin with ABC the id of the
Should be changed to:
Select the id from t the where the name like '% ABC'

9. Don't in the where clause of the "=" to the left to carry on the function, the arithmetic expression or other operations, otherwise will may not be able to correct use of the index system,

10. When using the index field as a condition, if the index is a composite index, then you have to use the first field in the index as a condition to guarantee the system using the index, the index will not be used, and should as far as possible let field is consistent with the indexed sequential order,

11. Don't write some meaningless query, such as the need to generate an empty table structure:
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull