Home > Software design >  Does the column order matter in a WHERE SQL statement
Does the column order matter in a WHERE SQL statement

Time:04-10

I'm working on a system with a good amount of indexes. Some indexes are simpler than others. I.e. they're INT, VARCHAR, DATETIME and in some cases ENUMS(maybe 5~25 variations).

Does the WHERE order matter? In other words, would placing the easier to search columns first increase speed/performance?

i.e. Let's say we have this table and it looks like this

CREATE TABLE IF NOT EXISTS `example_table` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `user_id` int(11) unsigned NOT NULL, -- 1 ~ 4,294,967,295 (non unique)
  `type_id` int(11) unsigned NOT NULL, -- (Enum with 15 values)
  `name` VARCHAR(255) NOT NULL, -- alphanumeric 
  `boolean_value` tinyInt(1) DEFAULT 0, -- only 0 or 1
  `created_date` DATETIME NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE INDEX `example_table_user_id_index` ON `example_table` (`user_id`);
CREATE INDEX `example_table_type_id_index` ON `example_table` (`type_id`);
CREATE INDEX `example_table_name_index` ON `example_table` (`name`);
CREATE INDEX `example_table_boolean_value_index` ON `example_table` (`boolean_value`);
CREATE INDEX `example_table_created_date_index` ON `example_table` (`created_date`)

Is it correct to say that we want to search for the column which will return the least results first that way the next condition will have less columns to flip through?

i.e. Would these statements give different performance results?

1. SELECT id FROM example_table WHERE boolean_value = ? AND user_id = ? AND type_id = ? LIMIT 1000
2. SELECT id FROM example_table WHERE type_id = ? AND user_id = ? AND boolean_value = ? LIMIT 1000
3. SELECT id FROM example_table WHERE user_id = ? AND type_id = ? AND boolean_value = ? LIMIT 1000

CodePudding user response:

Does the WHERE order matter? In other words, would placing the easier to search columns first increase speed/performance?

Short answer : no.

Longer answer : SQL is a declarative, not procedural, language. It's the only declarative language most of us devs deal with regularly. The server software has query planning modules that use various schemes for figuring out the most-likely most efficient way of getting the result set from each query. The query plan can vary, even with the same query and different data. So we tell it what we want, not how to get it.

Sometimes it's necessary to adjust indexes or refactor queries to get decent performance. You can read about that in the tag. But the refactoring is never as simple as changing the order of terms in WHERE clauses.

And, pro tip: lots of single column indexes are very rarely a good idea. Indexes need to be designed to match the shape of the queries in use. Read this ebook by Marcus Winands: https://use-the-index-luke.com

CodePudding user response:

(See OJones for the answer to your actual question. Now for optimizing the queries.)

These will optimize identically:

1. SELECT id FROM example_table
     WHERE boolean_value = ?
       AND user_id = ?
       AND type_id = ? LIMIT 1000

2. SELECT id FROM example_table
     WHERE type_id = ?
       AND user_id = ?
       AND boolean_value = ? LIMIT 1000

3. SELECT id FROM example_table
     WHERE user_id = ?
       AND type_id = ?
       AND boolean_value = ? LIMIT 1000

But the optimal index will be a 3-column ("composite") index such as

INDEX(boolean_value, type_id, user_id)

The order in the index will not -- in this example -- matter.

Your single-column indexes will be inadequate.

  • Related