MySQL twenty million data table optimization solution process, provides three solutions
The 3700 14:28:26 2019-02-21
Summary of problems
Use ali cloud RDS for MySQL database (MySQL5.6 version), a user's web form 6 months data volume of nearly 20 million, keep the amount of data to reach $40 million a recently, query speed is very slow, daily stuck, serious impact on business,
Problem premise: the old system, then design system is probably haven't graduated from university, table design and SQL statements to write is not only a garbage,, can't look straight into the original developer, who left to me to maintain, this is the legendary maintenance run road, and then I was off the pit that!!!!!!
I try to solve the problem, so, there is a this log,
Project overview
?
Solution a: optimize the existing mysql database, advantages: does not affect existing business, don't need to modify the source program code, lowest cost, disadvantages: has optimized the bottleneck, the data measured, and it's over,
?
?
Scheme 2: to upgrade the database type, another 100% compatible with mysql database, advantages: does not affect existing business, don't need to modify the source program code, you almost don't need to do any operation can improve database performance, disadvantages: spend more money
?
?
Solution 3: one pace reachs the designated position, big data solutions, replace newsql/no (database, advantages: strong scalability, low cost, no data capacity bottlenecks, weakness: the need to modify the source code
?
All three of these solutions, in order to use, in hundred million level data quantity of it is not necessary to change no, development cost is too high, I have tried again, three ways, and have formed a solution to the ground and the process of heart sympathy run that a few developers ten thousand times:)
One detail: optimizing the existing mysql database
Ali cloud database with bosses and Google solution and ask in the group of telephone communication, summarized below (all essence) :
?
1. The design of database and the table creation will consider performance
2. Write SQL need to pay attention to optimize the
4. The partition
4. The table
?
?
5. Depots
?
1. The design of database and the table creation will consider performance
Mysql database itself is highly flexible, performance is not enough, depends heavily on ability of developers, which means the developer ability is high, the mysql performance is high, it is also a lot of the common fault of the relational database, so the company of the dba usually giant high wages,
Table design should pay attention to:
?
Table fields avoided null values, a null value is difficult to query optimization and take up additional index space, recommended default 0 instead of null,
?
?
Try to use the INT rather than BIGINT, if the nonnegative and UNSIGNED (numerical capacity will be doubled), of course can use TINYINT, SMALLINT, MEDIUM_INT better,
?
?
Use enum instead of string or integer type
?
?
As far as possible use TIMESTAMP, rather than a DATETIME
?
?
Single table don't have too many fields, it is suggested that within 20
?
?
Use the integer to save IP
?
The index
?
Index is not the more the better, according to the query targeted created, considering WHERE and ORDER BY ORDER in column index, according to EXPLAIN to check whether to use the index or a full table scan
?
?
Should be avoided in the WHERE clause to NULL value judgment of field, otherwise will lead to a full table scan engine abandon the use of index
?
?
Value distribution is sparse field discomfort & indexes, such as "gender" this kind of only two or three value field
?
?
Character field only establish prefix index
?
?
Character field it is best not to make decision key
?
?
No foreign keys, by the program to ensure constraints
?
?
As far as possible need not UNIQUE, the procedure to ensure constraints
?
?
Idea when using multi-column index sequence and consistent query conditions, at the same time removing unnecessary single-column index
?
In short is to use the appropriate data type, select the appropriate index
Select the appropriate data type (1) use the smallest data types can save data, cosmetic & lt; The date, time & lt; Char, varchar & lt; Blob (2) the use of simple data types, integer character processing overhead than smaller, because the string is more complicated, such as, int type storage time, turn bigint type IP function (3) use reasonable length field properties, fixed length of the table will be faster, use enum, char, rather than a varchar (4) try to use the not null define (5) use less as far as possible the text field, you'd better to choose the appropriate index table # column (1) query frequent column, in the where, order by, group by, on clause in column (2) in the where condition & lt; ,<==, & gt; , & gt;=, between, in, and the like + wildcard strings (%) of the column (3) the length of the small columns, index field as small as possible, because the database storage unit is page and a page that can save the more data the better (4) discrete degree is big (more than different values) columns, in front of the joint index, view the discrete degree, through the statistics of different column values, count, the greater the dispersion degree is higher:
The original developers have run way, the table already set up, I can't modify, reason is: the words can't perform, give up!
2. Write SQL need to pay attention to optimize the
?
Using the limit to the records of the results of the query to limit
?
?
Avoid select * will need to find the field list
?
?
Use the connection (join) to replace the subquery
?
?
Break up large delete or insert statement
?
?
Can open the slow query log to find out more slowly through the SQL
?
?
Don't do arithmetic: SELECT id WHERE age + 1=10, all to the operation of the column will lead to a table scan, it includes database tutorial function, expression and so on, the query will shift to the right hand side, as far as possible,
?
?
SQL statements as simple as possible: a SQL in only one CPU operation; Big statement which hurt small, reduce the lock time; A large SQL can be closed the whole library
?
?
The OR rewritten to: IN the efficiency of the OR is n level, IN the level of efficiency is the log (n), IN the number of recommended control within 200
?
?
Don't function and trigger, and in application to achieve
?
?
Avoid XXX % type query
?
?
Use less JOIN
?
?
Use the same type, such as' 123 'and' 123 ', 123 and 123 than
?
?
Try to avoid in the WHERE clause to use!=or & lt;> Operator, otherwise will full table scan engine abandon the use of index
?
?
For continuous value BETWEEN, AND with it: IN the SELECT id FROM t WHERE num is BETWEEN 1 AND 5
?
?
Don't take a full table, the list data to use LIMIT to paging, each page number is not too big
?
The original developers have run, the program has been completed online, I can't modify the SQL, reason is: the words can't perform, give up!
Engine
Engine
Currently widely used are two MyISAM and InnoDB engine:
1.
MyISAM
2.
3.
MyISAM engine is MySQL 5.1 and previous versions of the default engine, its characteristic is:
4.
?
Does not support the row locks, read to need to read all the table lock, when written to add exclusive lock on the table
?
?
Do not support transaction
?
?
Does not support foreign keys
?
?
Does not support after the collapse of safe recovery
?
?
To read the query table has at the same time, support to insert a new record in the table
?
?
Support the BLOB and TEXT index of the first 500 characters, support full-text index
?
?
Support delay update the index, and greatly improve written
?
?
For table will not modify, support compressed tables, greatly reduce the disk space occupied
?
1.
InnoDB
2.
3.
InnoDB in MySQL 5.5 after the default index, its characteristic is:
4.
?
Support row locks, MVCC is used to support high concurrency
?
?
Support transactions
?
?
Support foreign keys
?
?
Support after the collapse of safe recovery
?
?
Does not support a full-text index
?
In general, MyISAM tables for the SELECT intensive, and the intensive InnoDB for INSERT and UPDATE table
MyISAM may be super fast speed, take up the storage space is small, but the program requires the transaction support, so the InnoDB is necessary, so it impossible to execute plan, give up!
3. The partition
MySQL introduced in version 5.1 of the partition is a simple horizontal resolution, the user need table under construction with partition parameters, is transparent to applications without modifying the code
For users, the logic of the partition table is a separate table, but the bottom is made up of multiple physical child table, realize the partition of the code is actually based on a set of object encapsulates the underlying table, but is a fully encapsulated for SQL layer at the bottom of the black box, MySQL implementation also means that the index partition way is according to the division of sub table definition, no global index
Users of SQL statements is the need for the partition table to do optimization, SQL conditions to bring the column partition conditions, so that the query to a small amount of partition, or it will scan all PARTITIONS, can EXPLAIN the PARTITIONS to see one of the SQL statement will fall on the partition, and SQL optimization, I test, the query conditions without partition column, will also improve the speed, so the measure is worth a try,
The benefits of the partition is:
?
Can let a single table to store more data
?
?
Partition table data is easier to maintain, can clear the whole partition batch delete large amounts of data, can also add the new partition to support the new insert data, on the other hand, can also optimize the a separate partition, inspection, repair operations such as
?
?
Part of the query can be determined from the query conditions only on a few partition, speed will soon
?
?
Partition table data can also be distributed in different physical device, thereby funny using multiple hardware
?
?
Lai avoid certain bottlenecks can be used to partition table, for example InnoDB mutually exclusive access to a single index, the ext3 filesystem inode lock competition
?
?
You can backup and restore a single partition
?
Partition the limitations and shortcomings:
?
There can be at most 1024 partitions from a table
?
?
If the partition in the field have a primary key or unique index column, then all primary key columns and the only index columns must be included
?
?
The partition table cannot use foreign key constraints
?
?
NULL values can make the partition filter is invalid
?
?
All partitions must use the same storage engine
?
The type of partition:
?
RANGE partitions: belonging to a given based continuum column values, distributing the multi-line to partition
?
?
LIST partitions: similar to the RANGE partitions, the difference is that the LIST partition is based on column value matching a collection of discrete values of some value to choose
?
?
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull