Suppose I have two tables (with the same amount of rows). The first one only has one column, say ID
. The second one has the same column plus an additional one, say Text
(of type longtext
) which may contain a lot of data.
Now consider a query of the form SELECT ID FROM table WHERE ID>=7
.
Question: Is this query slower on the second table then on the first one? The query does not use the Text
column, but maybe this data still affects the performance of the query.
CodePudding user response:
In this simple query the most part of the query execution time is a time needed for reading the data from disk, and total time is approximately equal to disk reading time.
Case 1. The column is not indexed in both tables.
The server should scan (and hence should read) the whole table for to find the rows matched the condition. In this case the time needed for the query execution is approximately proportional to the disk size occupied by the table.
Case 2. The column is clustered index expression in both tables.
This can be when this column is the only column in the primary key expression or, if the primary key is not defined, when this column is the only column in the unique key expression, this unique key is the most first index defined in the table structure (physically in CREATE TABLE code), and this column is defined as NOT NULL.
The server should read the table part starting from the block which contains the row with first value matched the condition for to find the rows matched this condition. The time needed for the query execution is approximately proportional to the disk size occupied by the table part to be read, so it is approximately proportional to the disk size occupied by the whole table. But when the table on-disk body pages contains too many "empty" elements (deleted rows) then the difference is unpredictable.
Case 3. The column is indexed as secondary index in both tables.
In this case the index is separate on-disk structure, and the server will read this index only. When the index expressions are equal, and the datatypes are equal too, then the disk sizes occupied by these indexes are approximately equal too. And the time needed for both queries execution will be approximately the same.
Case 4. In one table this column is clustered index expression and in another table it is secondary index expression.
The disk size needed for secondary index is less than the disk size needed for clustered index. So the query to the table in which this column is secondary index will be faster.
CodePudding user response:
To echo "PM-77's" reply: if ID
is a primary key, then it necessarily has an (implicit ...) index. The query will make use of this index.
Therefore, any other (irrelevant) structural differences will not affect the performance of the query.