Build secondary indexes more freely-CodePudding

There is a table t_bl with the following fields (id,a,b).

select a,b from t_bl where a = "XXX";

In this case, the b field is not used as the basis for retrieval, but needs to appear in the retrieval results.

Then there are the following two index building schemes.

Create an index on the a field

Advantage: Makes each node of the index reduce the space overhead of storing the b field.

Disadvantage: Returning the query results requires returning the table to the clustered index through the primary key id of the secondary index to query the value of the b field, which affects the query performance.
Create a joint index of a,b fields

Advantage: Covering indexes can be used to reduce return tables and improve query efficiency.

Disadvantage: Making every non-node in the secondary index adds unnecessary space overhead for storing the b field (because the b field is not the basis for retrieval).

So why not provide a mechanism that enables users to create a secondary index that is asymmetric between non-leaf nodes and leaf nodes?

For example, in this example, the user's better choice is to create a non-leaf node to store the a field, and the leaf node to store the indexes of the two fields a and b.

CodePudding user response：

Some implementations of SQL databases do exactly what you describe, adding a column to the leaf node only, so it doesn't take space in the non-leaf nodes, but can be used for covering indexes.

An example of a product that does this is Microsoft SQL Server, which supports syntax allowing you to define optional non-key columns to INCLUDE() in a secondary index. See https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns?view=sql-server-ver16

However, InnoDB is not currently implemented to do this. As far as I know, there's no reason it can't do this, but they have not implemented it. I guess other features were higher priority.

For what it's worth, the SQL standard doesn't include anything about indexes, so each vendor implements their indexing feature totally as an extension to the standard. They are therefore free to implement index features according to their own priorities.

CodePudding user response：

Simply make a "composite" "covering" index:

INDEX(a, b)

That particular query will run faster because it does not need to bounce between the index's BTree and the data's BTree.

In some similar situations, I recommend

PRIMARY KEY(a, id),  -- all columns are efficiently accessed via a=..
INDEX(id)   -- to keep AUTO_INCREMENT happy

As for 2's disadvantage -- It is tiny. I like the Rule of Thumb that MySQL's BTrees have a fan-out of about 100. That is, each node has 100 child nodes. The corollary is that the non-leaf nodes take up only 1% of the total disk space. (This may be the clinching argument for the developers to say "let's not bother implementing INCLUDE".)