I have a MySQL database with the following structure :
mysql> describe company;
------- ------------- ------ ----- --------- ----------------
| Field | Type | Null | Key | Default | Extra |
------- ------------- ------ ----- --------- ----------------
| id | int | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
------- ------------- ------ ----- --------- ----------------
mysql> describe nameserver;
----------- -------------- ------ ----- --------- ----------------
| Field | Type | Null | Key | Default | Extra |
----------- -------------- ------ ----- --------- ----------------
| id | int | NO | PRI | NULL | auto_increment |
| companyId | int | NO | MUL | NULL | |
| ns | varchar(250) | NO | MUL | NULL | |
----------- -------------- ------ ----- --------- ----------------
mysql> describe domain;
-------------- -------------- ------ ----- ------------------- -------------------
| Field | Type | Null | Key | Default | Extra |
-------------- -------------- ------ ----- ------------------- -------------------
| id | int | NO | PRI | NULL | auto_increment |
| nameserverId | int | NO | MUL | NULL | |
| domain | varchar(250) | NO | MUL | NULL | |
| tld | varchar(20) | NO | MUL | NULL | |
| createDate | datetime | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
| updatedAt | datetime | YES | | NULL | |
| status | tinyint | NO | | NULL | |
| fileNo | smallint | NO | MUL | NULL | |
-------------- -------------- ------ ----- ------------------- -------------------
The amount of data is as following:
domain table about 500 millions records
nameserver table about 2 millions records
Running this query take about 4 hours to get me the result :
SELECT distinct domain FROM domain
INNER join nameserver on nameserver.id = domain.nameserverId
WHERE nameserver.companyId = 2
My question is how can I improve the speed of getting query from this database?
It is possible for me to change the DB structure or even replace it with another DBMS.
MySQL is running on a VPS with 8.0 GB RAM and dual core CPU.
CodePudding user response:
nameserver: INDEX(companyId, id) -- in this order
domain: INDEX(nameserverId, domain) -- in this order
("MUL" does not tell me whether you already have either of these composite indexes. SHOW CREATE TABLE
is more descriptive than DESCRIBE
.)
CodePudding user response:
1 Add indexes to the relevant columns: Adding indexes to the companyId, nameserverId, and domain columns in the nameserver and domain tables can help to speed up the query by allowing the database to quickly locate the relevant rows.
2 Use a covering index: A covering index is an index that includes all the columns that are used in the query. By creating a covering index on the companyId, nameserverId, and domain columns, you can avoid the need for the database to look up the data in the actual tables, which can improve query performance.
3 Use a column-store index: A column-store index is an index that stores data by column rather than by row. Column-store indexes can be more efficient for querying large datasets and can improve the performance of the query you provided.
4 Use a database management system that is optimized for large datasets: If you are using a database management system that is not well-suited to handling large datasets, you may see improved performance by switching to a different system. Some options to consider include column-oriented database management systems such as Vertica or ClickHouse, or distributed database management systems such as Cassandra or HBase.
5 Consider using a distributed database: If you have a very large dataset and are still experiencing slow query performance, you may want to consider using a distributed database management system, which allows you to spread your data across multiple servers and can improve the scalability and performance of your database.
6 It's important to keep in mind that the specific solutions that work best for you will depend on the specific requirements of your database and the workload you are placing on it. It may be helpful to perform some benchmarking and testing to determine which approaches work best for your needs.