Here's exactly what I did
# mysql --version
mysql Ver 14.14 Distrib 5.7.34, for Linux (x86_64) using EditLine wrapper
Case 1:
CREATE TABLE `Foo2` (
`id` bigint(20) unsigned NOT NULL,
`column1` int(10) unsigned NOT NULL,
`column2` int(10) unsigned NOT NULL,
`column3` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `composite` (`column1`, `column2`, `column3`)
) ENGINE=InnoDB;
# BASH: for i in `seq 1 10000`; do mysql -uroot -p"password" -e "replace into foo_db.Foo2 (\`id\`,\`column1\`,\`column2\`,\`column3\`) values ($((i * 2)), ${i}, $((i 1)), NOW());" >/dev/null 2>&1; done
and the following explain with result
explain select * from foo_db.Foo2 order by RAND() limit 5;
select count(*) from foo_db.Foo2;
# result : 10000
Case 2:
CREATE TABLE `Foo3` (
`id` bigint(20) unsigned NOT NULL,
`column1` int(10) unsigned NOT NULL,
`column2` int(10) unsigned NOT NULL,
`column3` datetime NOT NULL,
`created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `composite` (`column1`, `column2`, `column3`)
) ENGINE=InnoDB;
# BASH: for i in `seq 1 10000`; do mysql -uroot -p"password" -e "replace into foo_db.Foo3 (\`id\`,\`column1\`,\`column2\`,\`column3\`) values ($((i * 2)), ${i}, $((i 1)), NOW());" >/dev/null 2>&1; done
explain select * from foo_db.Foo3 order by RAND() limit 5;
explain select `id` from foo_db.Foo3 order by RAND() limit 5;
select count(*) from foo_db.Foo3;
# result : 10000
Points that I don't understand
Why the number of rows shown in explain does not match the actual select count(*)?
In Case 1, why the unrelated
composite
key is used inorder by RAND()
whilepossible_keys
is null?In Case 2, why the behavior is different from Case 1 with
select *
but the behavior becomes the same as Case 1 if I doselect id
instead?In Case 2, why is the primary key
id
not used in theselect id
result?
Partial helps would also be appreciated.
CodePudding user response:
First, let me list a bunch of things. These will handle some of your questions.
- The Optimizer can read the data's BTree or a secondary index's BTree. In the latter case, the INDEX is called "covering" and indicated in EXPLAIN via "Using index".
- ENGINE=InnoDB includes all the PK columns in each secondary index.
- Case 1's secondary index contains all the columns that are needed for
SELECT *
. - Therefore, cases 1 and 2b can use a covering index.
- The Optimizer, if given a choice of which BTree to use, usually picks the smaller one. Perhaps the secondary index for case 1 happened to be smaller.
- The PRIMARY KEY is "clustered" with the data.
Now for your specific questions.
- With InnoDB, the exact number of rows is not kept; instead it is estimated at run time. Explain's "Rows" column is sometimes off by a factor of 2 either way. (Sometimes even farther off.)
- Why "NULL"? (See above, plus...) One of those mysteries of life. I find it low on the list of things to figure out.
- and
SELECT *
needs all the columns, but there are now some columns that are not in the secondary index.SELECT id
-- as mentioned above about the "smallest".
Somewhat related: See Random for the fastest algorithms for fetching a random subset of a table.