Home > Enterprise >  Understanding MySQL select order by Rand: why is unrelated KEY used in some case?
Understanding MySQL select order by Rand: why is unrelated KEY used in some case?

Time:07-25

Here's exactly what I did

# mysql --version
mysql  Ver 14.14 Distrib 5.7.34, for Linux (x86_64) using  EditLine wrapper

Case 1:

CREATE TABLE `Foo2` (
  `id` bigint(20) unsigned NOT NULL,
  `column1` int(10) unsigned NOT NULL,
  `column2` int(10) unsigned NOT NULL,
  `column3` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `composite` (`column1`, `column2`, `column3`)
) ENGINE=InnoDB;
# BASH: for i in `seq 1 10000`; do mysql -uroot -p"password" -e "replace into foo_db.Foo2 (\`id\`,\`column1\`,\`column2\`,\`column3\`) values ($((i * 2)), ${i}, $((i   1)), NOW());" >/dev/null 2>&1; done

and the following explain with result

explain select * from foo_db.Foo2 order by RAND() limit 5;

enter image description here

select count(*) from foo_db.Foo2;
# result : 10000

Case 2:

CREATE TABLE `Foo3` (
  `id` bigint(20) unsigned NOT NULL,
  `column1` int(10) unsigned NOT NULL,
  `column2` int(10) unsigned NOT NULL,
  `column3` datetime NOT NULL,
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `composite` (`column1`, `column2`, `column3`)
) ENGINE=InnoDB;
# BASH: for i in `seq 1 10000`; do mysql -uroot -p"password" -e "replace into foo_db.Foo3 (\`id\`,\`column1\`,\`column2\`,\`column3\`) values ($((i * 2)), ${i}, $((i   1)), NOW());" >/dev/null 2>&1; done
explain select * from foo_db.Foo3 order by RAND() limit 5;

enter image description here

explain select `id` from foo_db.Foo3 order by RAND() limit 5;

enter image description here

select count(*) from foo_db.Foo3;
# result : 10000

Points that I don't understand

  1. Why the number of rows shown in explain does not match the actual select count(*)?

  2. In Case 1, why the unrelated composite key is used in order by RAND() while possible_keys is null?

  3. In Case 2, why the behavior is different from Case 1 with select * but the behavior becomes the same as Case 1 if I do select id instead?

  4. In Case 2, why is the primary key id not used in the select id result?

Partial helps would also be appreciated.

CodePudding user response:

First, let me list a bunch of things. These will handle some of your questions.

  • The Optimizer can read the data's BTree or a secondary index's BTree. In the latter case, the INDEX is called "covering" and indicated in EXPLAIN via "Using index".
  • ENGINE=InnoDB includes all the PK columns in each secondary index.
  • Case 1's secondary index contains all the columns that are needed for SELECT *.
  • Therefore, cases 1 and 2b can use a covering index.
  • The Optimizer, if given a choice of which BTree to use, usually picks the smaller one. Perhaps the secondary index for case 1 happened to be smaller.
  • The PRIMARY KEY is "clustered" with the data.

Now for your specific questions.

  1. With InnoDB, the exact number of rows is not kept; instead it is estimated at run time. Explain's "Rows" column is sometimes off by a factor of 2 either way. (Sometimes even farther off.)
  2. Why "NULL"? (See above, plus...) One of those mysteries of life. I find it low on the list of things to figure out.
  3. and
  4. SELECT * needs all the columns, but there are now some columns that are not in the secondary index.SELECT id -- as mentioned above about the "smallest".

Somewhat related: See Random for the fastest algorithms for fetching a random subset of a table.

  • Related