Home > database >  Different speed of theoretically equal queries on MySQL
Different speed of theoretically equal queries on MySQL

Time:05-22

I have found a strange speed issue with one of my MySQL queries when run on two different columns, date_from vs date_to.

The table structure is the following:

create table if not exists table1 (
    id unsigned int,
    field2 int,
    field3 varchar(32),
    date_from date not null,
    date_to date not null,
    field6 text
);

create unique index idx_uniq_table1 on table1 (id, field2, field3, date_from);
create index idx_table1_id on table1 (id);
create index idx_table1_field2 on table1 (field2);
create index idx_table1_field3 on table1 (field3);
create index idx_table1_date_from on table1 (date_from);
create index idx_table1_date_to on table1 (date_to);

When I run this query using date_from, execution time is 1.487 seconds:

select field3, min(date_from) from table1 group by field3;

When I run this other query using date_to, execution time is 13.804 seconds, almost 10 times slower:

select field3, max(date_to) from table1 group by field3;

Both columns are NOT NULL, so there are no empty values.

The table has ~7M rows.

The only difference that I see between these two columns is that date_from appears in the unique index but, as far as I know, that should't make a difference if not filtering by all four columns in the index.

Am I missing anything?

This is the explain of the date_from column:

{
    "query_block": {
        "select_id": 1,
        "cost_info": {
        "query_cost": "889148.90"
        },
        "grouping_operation": {
            "using_filesort": false,
            "table": {
                "table_name": "table1",
                "access_type": "index",
                "possible_keys": [
                    "idx_uniq_table1",
                    "idx_table1_id",
                    "idx_table1_field2",
                    "idx_table1_field3",
                    "idx_table1_date_from",
                    "idx_table1_date_to"
                ],
                "key": "idx_table1_field3",
                "used_key_parts": [
                    "field3"
                ],
                "key_length": "130",
                "rows_examined_per_scan": 5952609,
                "rows_produced_per_join": 5952609,
                "filtered": "100.00",
                "using_index": true,
                "cost_info": {
                    "read_cost": "293888.00",
                    "eval_cost": "595260.90",
                    "prefix_cost": "889148.90",
                    "data_read_per_join": "908M"
                },
                "used_columns": [
                    "id",
                    "field2",
                    "field3",
                    "date_from"
                ]
            }
        }
    }
}

This is the explain of the date_to column:

{
    "query_block": {
        "select_id": 1,
        "cost_info": {
            "query_cost": "889148.90"
        },
        "grouping_operation": {
            "using_filesort": false,
            "table": {
                "table_name": "table1",
                "access_type": "index",
                "possible_keys": [
                    "idx_uniq_table1",
                    "idx_table1_id",
                    "idx_table1_field2",
                    "idx_table1_field3",
                    "idx_table1_date_from",
                    "idx_table1_date_to"
                ],
                "key": "idx_table1_field3",
                "used_key_parts": [
                    "field3"
                ],
                "key_length": "130",
                "rows_examined_per_scan": 5952609,
                "rows_produced_per_join": 5952609,
                "filtered": "100.00",
                "cost_info": {
                    "read_cost": "293888.00",
                    "eval_cost": "595260.90",
                    "prefix_cost": "889148.90",
                    "data_read_per_join": "908M"
                },
                "used_columns": [
                    "id",
                    "field2",
                    "field3",
                    "date_from",
                    "date_to"
                ]
            }
        }
    }
}

The only difference I see is in used_columns, at the end, where one contains date_to and the other doesn't.

CodePudding user response:

Naughty. There is no PRIMARY KEY.

Since the "used columns" does not seem to agree with the queries, I don't want to try to explain the timing difference.

Replace the index on field3 by these two:

INDEX(field3, date_from)
INDEX(field3, date_to)

Those will speed up your two Selects.

CodePudding user response:

In addition to Rick's answer about proper index based on what your criteria is... The reason for the speed difference is that the one index that had both the field3 and date_from, the engine was able to use the data within the actual index instead of having to go to the raw data pages that contain the entire record. The index that only had the date_to still had to go to every raw data record to get the field3, thus taking the time.

That is why you can utilize covering indexes. Having each component of data you are looking for to optimize the query. Not saying you want an index with 20 columns, but in this context of what might be common criteria for filtering is exactly why you do.

  • Related