Home > Back-end >  Database optimization
Database optimization

Time:09-21

Database optimization
Why do you want to optimize
The system throughput bottleneck tend to appear on the database access speed
As the application is running, the data in the database will be more and more, the processing time may slow down the corresponding
Data is stored in the disk, reading and writing speed cannot compare with memory
Rule of optimization: reduce the system bottlenecks, reduce the resource usage, increase reaction velocity,

The database structure optimization
A good database design scheme for the performance of the database often can have twice the result with half the effort,

Need to consider the data redundancy, the speed of query and update, the data type of the field is reasonable aspects of content,

Will field a lot of table is decomposed into multiple tables

For table field more, if some fields use frequency is low, these fields can be separated to form a new table,

Because when a large amount of data in the table, will be due to the existence of the use of low frequency field slower,

Increase the middle table

For joint query table, often can create the middle table in order to improve the query efficiency,

Through the establishment of the middle table, insert need combination query of data in the table in the middle, and then the original joint query instead of the table in the middle of the query,

Increase the redundancy field

Design data table should be follow the statute of paradigm theory, as far as possible to reduce the redundant fields, make database design look delicate, elegant, however, reasonable to join redundancy field can improve the query speed,

Table standardization degree is higher, the relationship between the table and table, the more the more you will need to join queries is also, the performance is poorer,

Note:

Redundancy field value in a table in the modified, will try to update in the other table, otherwise it will cause data inconsistency problem,

MySQL database has reached 500% CPU him how to deal with?
When the CPU soared to 500%, with the operating system command top command first to observe whether mysqld to take up, if not, find out to take up the process of high, and the related processing,

If is caused by mysqld, show the processlist and see inside the run session, whether have consume resources of SQL running, find out the high consumption of SQL, look at the execution plan is accurate, if the index is missing, or certainly is the amount of data is too large to cause,

In general, be sure to kill off these threads (and observe whether the CPU usage drop), and so on carries on the corresponding adjustment (such as the index, change the SQL, change the memory parameters), run the SQL again,

It is possible that each SQL consume resources is not much, but all of a sudden, there are a large number of session in lead to soaring CPU, this is need to analyze why the number of connections with application of surge and to make corresponding adjustments, such as to limit the number of connections, etc.

Optimization of large table? A table nearly thousands of data, the CRUD is slow, how to optimize? The depots table do? What's the problem with the table depots? Useful to the middleware? The principle of they know?
When single MySQL table record number is too large, the database of the CRUD performance will be significantly lower, some of the common optimization measures are as follows:

Define the scope of data: be sure to prohibited without any restricted data range query conditions, such as: when the user query order history, we can control within a month,;
Read/write separation: classic database resolution scheme, the main library is responsible for writing, read from the library is responsible for;
Cache: using MySQL cache, in addition to the heavyweight, update data can consider to use less application level cache;
There is optimized by means of depots table, there are mainly vertical level table and table

Vertical partitioning:

Take apart according to the correlation of data in database table, for example, the user table there are both user login information and the basic information of the users, the users table can be split into two separate table, even in a separate library do depots,

Simple vertical resolution is index according to the table column separation, the one more sheets split multiple tables, as shown in the figure below, so everyone should it is easier to understand,

Img

Vertical resolution advantages: can making smaller row data, and reduce the number of read Block in the query, reduce the number of I/O, in addition, the vertical partition can simplify the table structure, easy maintenance,

Vertical resolution of faults: the primary key will be redundant, need to manage the redundant columns, and will cause Join operation, can be solved through to Join in the application layer, in addition, the vertical partition will make affairs become more complicated.

The vertical table
The primary key and some of columns in a table, then the primary and the other column in another table

Img

Applicable scenario
1, if a certain columns in the table are commonly used, in addition some columns are not commonly used
Smaller 2, can make the data line, and a data page can store more data, query time reduce the number of I/O
Disadvantages
Some table strategy logic algorithm based on the application layer, once the logic algorithm change, will change the entire table logic, poor scalability
Increase the cost of development for the application layer, logic algorithm
Redundancy management, query all the data need to join operation
Horizontal partitioning:

The data table structure unchanged, through some strategies to store data fragmentation, so every piece of data across different tables or in the library, achieve the goal of a distributed, horizontal resolution can support very large amount of data,

Level separation index according to the table rows is split, the number of rows in the table of more than 2 million lines, will slow down, then you can put a table data into multiple tables to store, for example: user information table can be split into multiple user information table, so you can avoid the single table affect performance, large amount of data of the

Database level split

Water split can support very large amount of data, the need to note is that: table is just to solve the problems of the single table data is too large, but due to the data table or on the same machine, is no significance for improving the capacity of MySQL concurrent, so split level best depots,

Split to support very large amount of data storage, transformation and less application end, business is difficult to solve, but divided crossover point Join performance is poorer, logic is complex,

The author of "the path to uniting the Java engineer" recommended try not to shard data, because the split brings logic and deployment, operational complexity, the general data tables in the case of a properly optimized support must below the amount of data that is not too big problem, fragmentation, if at all, try to choose the client subdivision architecture, which reduces a network I/O and middleware,

Level table:
Table is very big, can reduce the query after segmentation need to read data and index pages, but also reduces the layers of index, improve query number

Img

Applicable scenario
1, the data in the table itself has independence, such as table records in the table data of various areas or during different periods of data, especially some of the commonly used data, some not commonly used,
2, need to put the data stored in multiple media,
Horizontal segmentation faults
1, adds complexity to the application, is usually need more than one table name query, query all data should be the UNION operation
2, in many database applications, more than it brings the advantages of this complexity, query increases when reading an index layer number of disk
Add database under the shard of two common solutions:

The client proxy: shard logic in the application side, encapsulated in a jar package, by modifying or encapsulate JDBC layer, dangdang Sharding - JDBC, ali TDDL is the comparison of the two kinds of commonly used to implement,
The middleware proxy: added an agent in the middle of the application and data layer, shard logic unified maintenance in middleware services, we are talking Mycat, 360 Atlas, netease DDB, and so on are the implementation of this architecture,
Depots table after problems

Transaction support depots, after the table is a distributed transaction, if depend on the database itself distributed transaction management functions to perform transactions, will be expensive in terms of performance; If the application to help control, form the transaction on the program logic, and will cause the burden of programming,

Across the library join

As long as it is shard, across nodes Join problem is inevitable, but good design and segmentation can reduce the happening of this kind of situation, common practice is to solve the problem of the two queries, query results focused for the first time find out the correlation data of id, according to these ids has launched a second request to get correlation data, depots table plan products

Across nodes count, order by, group by, and aggregation function which is a kind of problems, because they need to be calculated based on all data collection, most of the agents will not merge automatic processing, solution: similar to solve the problem of node joins across, respectively, after get the result on each node in the application end to merge, and join different is each node of the query can be executed in parallel, so a lot of times it faster than a single large table a lot, but if the result set is large, the application memory consumption is a problem,

Data migration, capacity planning, capacity and other issues from taobao integrated service platform team, more than it used to a multiple of 2 take forward compatible characteristics (e.g., for more than 4 take 1 for more than 2 to 1) to assign data, to avoid the line levels of data migration, but still need to be table level of migration, as well as the expanding scale and table number has limits, in general, these ideas are not very ideal, more or less have some shortcomings, this also reflects from one side the difficulty of the expansion and Sharding,

ID problem

Once the database is split into multiple physical nodes, we can no longer rely on primary key generation mechanism of the database itself, on the one hand, a partition database from the generated ID cannot be guaranteed on the global is the only; Application before inserting data, on the other hand, need to get ID, first for SQL routing. Some of the most common primary key generation strategy

UUID use UUID primary key is the most simple solution, but the downside is also very obvious, because the UUID is very long, in addition to take up a lot of storage space, the main problem is on the index, index and query performance problems, based on the index of Twitter distributed on the Snowflake ID algorithm in the distributed system, the need to generate more global UID and the occasion, Twitter Snowflake solved this need, implementation is also very simple, to remove configuration information, the core code 41 with millisecond time machine ID 12 10 milliseconds sequences,

Across a shard of ordering page

Speaking, paging need to be carried out in accordance with the specified field when sorting, when sorting field is divided by fragmentation rules we can easy to locate to the designated shard, and when sorting the shard field, it gets more complicated, in order to the accuracy of the final result, we need to sort data in different subdivision node and return, and will return to the different subdivision summary again and sort the result set, and then returned to the user, as shown in the figure below:

Insert picture description here

CodePudding user response:

  • Related