I am trying to count the rows having created_at
>=
2019-03-01
.
created_at
format is like 2018-03-18T14:37:35.000Z
I noticed that the table uses indexing as follows : but the created_at doesn't have an index :'( how can i use the current indexes to speed it up ?
{
"records": [
{
"Table": "bugs",
"Non_unique": 0,
"Key_name": "PRIMARY",
"Seq_in_index": 1,
"Column_name": "id",
"Collation": "A",
"Cardinality": 9791826,
"Sub_part": null,
"Packed": null,
"Null": "",
"Index_type": "BTREE",
"Comment": "",
"Index_comment": ""
},
{
"Table": "bugs",
"Non_unique": 1,
"Key_name": "index_bugs_on_category_and_token_and_reported_at",
"Seq_in_index": 1,
"Column_name": "category",
"Collation": "A",
"Cardinality": 1,
"Sub_part": null,
"Packed": null,
"Null": "YES",
"Index_type": "BTREE",
"Comment": "",
"Index_comment": ""
},
{
"Table": "bugs",
"Non_unique": 1,
"Key_name": "index_bugs_on_category_and_token_and_reported_at",
"Seq_in_index": 2,
"Column_name": "token",
"Collation": "A",
"Cardinality": 29946,
"Sub_part": null,
"Packed": null,
"Null": "YES",
"Index_type": "BTREE",
"Comment": "",
"Index_comment": ""
},
{
"Table": "bugs",
"Non_unique": 1,
"Key_name": "index_bugs_on_category_and_token_and_reported_at",
"Seq_in_index": 3,
"Column_name": "reported_at",
"Collation": "A",
"Cardinality": 6085027,
"Sub_part": null,
"Packed": null,
"Null": "YES",
"Index_type": "BTREE",
"Comment": "",
"Index_comment": ""
}
]
}
I tried to do :
SELECT count(id) FROM table WHERE created_at >= '20190301';
but it always gives server timeout.
it's worth mentioning that counting rows doesn't timeout the server :
SELECT count(*) from bugs;
output :
"records": [
{
"count(*)": 10049501
}
]
}
CodePudding user response:
If your question is why is the (*) does not time out the answer is this -- in order to count the (id) it has to validate that the id column does not contain null. null is not counted by the count statement. Since the table has 10 million rows it does not finish before the timeout.
CodePudding user response:
Indexing created_at
is the most important thing you can do to optimize your query.
If created_at
is not indexed, it will have to scan the whole table. If you run explain SELECT count(id) FROM table WHERE created_at >= '20190301'
you will see something like this:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE test ALL 123 100.00 Using where
ALL
meaning it must scan all 10 million rows.
With an index it can quickly find only those rows where created_at >= '20190301'
.
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE test range created_at created_at 4 1 100.00 Using where; Using index
However, if created_at >= '20190301'
would be a large portion of the table it might decide scanning the whole table is faster than using the index.
The second thing you can do is to make sure created_at
is a datetime
type, not varchar
. This will store the date as a number and the comparison will be faster. This will not improve performance nearly as much as an index will, but it will help. More importantly, it will make it much easier to work with the values and you can take advantage of date and time functions.