Understanding MySQL select order by Rand: why is unrelated KEY used in some case?-CodePudding

Here's exactly what I did

# mysql --version
mysql  Ver 14.14 Distrib 5.7.34, for Linux (x86_64) using  EditLine wrapper

Case 1:

CREATE TABLE `Foo2` (
  `id` bigint(20) unsigned NOT NULL,
  `column1` int(10) unsigned NOT NULL,
  `column2` int(10) unsigned NOT NULL,
  `column3` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `composite` (`column1`, `column2`, `column3`)
) ENGINE=InnoDB;
# BASH: for i in `seq 1 10000`; do mysql -uroot -p"password" -e "replace into foo_db.Foo2 (\`id\`,\`column1\`,\`column2\`,\`column3\`) values ($((i * 2)), ${i}, $((i   1)), NOW());" >/dev/null 2>&1; done

and the following explain with result

explain select * from foo_db.Foo2 order by RAND() limit 5;

select count(*) from foo_db.Foo2;
# result : 10000

Case 2:

CREATE TABLE `Foo3` (
  `id` bigint(20) unsigned NOT NULL,
  `column1` int(10) unsigned NOT NULL,
  `column2` int(10) unsigned NOT NULL,
  `column3` datetime NOT NULL,
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `composite` (`column1`, `column2`, `column3`)
) ENGINE=InnoDB;
# BASH: for i in `seq 1 10000`; do mysql -uroot -p"password" -e "replace into foo_db.Foo3 (\`id\`,\`column1\`,\`column2\`,\`column3\`) values ($((i * 2)), ${i}, $((i   1)), NOW());" >/dev/null 2>&1; done

explain select * from foo_db.Foo3 order by RAND() limit 5;

explain select `id` from foo_db.Foo3 order by RAND() limit 5;

select count(*) from foo_db.Foo3;
# result : 10000

Points that I don't understand

Why the number of rows shown in explain does not match the actual select count(*)?
In Case 1, why the unrelated composite key is used in order by RAND() while possible_keys is null?
In Case 2, why the behavior is different from Case 1 with select * but the behavior becomes the same as Case 1 if I do select id instead?
In Case 2, why is the primary key id not used in the select id result?

Partial helps would also be appreciated.

CodePudding user response：

First, let me list a bunch of things. These will handle some of your questions.

The Optimizer can read the data's BTree or a secondary index's BTree. In the latter case, the INDEX is called "covering" and indicated in EXPLAIN via "Using index".
ENGINE=InnoDB includes all the PK columns in each secondary index.
Case 1's secondary index contains all the columns that are needed for SELECT *.
Therefore, cases 1 and 2b can use a covering index.
The Optimizer, if given a choice of which BTree to use, usually picks the smaller one. Perhaps the secondary index for case 1 happened to be smaller.
The PRIMARY KEY is "clustered" with the data.

Now for your specific questions.

With InnoDB, the exact number of rows is not kept; instead it is estimated at run time. Explain's "Rows" column is sometimes off by a factor of 2 either way. (Sometimes even farther off.)
Why "NULL"? (See above, plus...) One of those mysteries of life. I find it low on the list of things to figure out.
and
SELECT * needs all the columns, but there are now some columns that are not in the secondary index.SELECT id -- as mentioned above about the "smallest".

Somewhat related: See Random for the fastest algorithms for fetching a random subset of a table.