The below query is going for a full table scan and doesn't know where it has a problem.
Query:
select journey_level,
src_screen_name from_entity_name,
src_screen_id from_entity_id,
dest_screen_name to_entity_name,
dest_screen_id to_entity_id,
dest_screen_id to_entity_id_list,
concat(journey_level, '-', src_screen_id, '-', src_screen_name) from_entity,
concat(journey_level 1, '-', dest_screen_id, '-', dest_screen_name) to_entity,
count(1) cnt,
sum(ifnull(is_crash, 0)) crash_cnt,
sum(ifnull(is_anr, 0)) anr_cnt,
sum(ifnull(is_rage, 0)) rage_cnt,
case dest_screen_id
when -1
then 'Red'
else ''
end node_color
from ue_summary.user_journey_screens_1943 s
where first_session_created_at between '2022-09-17 00:00:00' and '2022-10-17 10:42:35'
and s.platform = '1'
and journey_level <= '4'
group by journey_level,
from_entity_id,
from_entity_name,
to_entity_id,
to_entity_name;
explain plan ;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: s
partitions: NULL
type: ALL
possible_keys: date_platform_level
key: NULL
key_len: NULL
ref: NULL
rows: 799471542
filtered: 1.67
Extra: Using where; Using temporary
table structure;
*************************** 1. row ***************************
Table: user_journey_screens_1943
Create Table: CREATE TABLE `user_journey_screens_1943` (
`id` bigint NOT NULL AUTO_INCREMENT,
`app_id` int DEFAULT NULL,
`platform` tinyint DEFAULT NULL,
`asi` bigint DEFAULT NULL,
`first_session_created_at` datetime DEFAULT NULL,
`user_task_id` bigint DEFAULT NULL,
`app_version_id` int DEFAULT NULL,
`journey_level` int DEFAULT NULL,
`src_screen_id` bigint DEFAULT NULL,
`src_screen_name` varchar(300) DEFAULT NULL,
`src_capture_time` bigint DEFAULT NULL,
`src_capture_time_relative` bigint DEFAULT NULL,
`dest_screen_id` bigint DEFAULT NULL,
`dest_screen_name` varchar(300) DEFAULT NULL,
`dest_capture_time` bigint DEFAULT NULL,
`dest_capture_time_relative` bigint DEFAULT NULL,
`is_crash` tinyint DEFAULT NULL,
`is_anr` tinyint DEFAULT NULL,
`is_rage` tinyint DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `app_id_date_level` (`app_id`,`first_session_created_at`,`journey_level`),
KEY `asi` (`asi`),
KEY `date_platform_level` (`first_session_created_at`,`platform`,`journey_level`)
) ENGINE=InnoDB AUTO_INCREMENT=1937612717 DEFAULT CHARSET=latin1
did I miss any index or due to any other?
CodePudding user response:
Your query's WHERE clause contains two value-range filters, one on the DATETIME
column and another on an INT with apparently small values. This is in addition to one equality filter on a TINYINT.
MySQL's indexes are BTREE indexes. You can think of them as lookup tables sorted in order. So, for a query to use an index effectively, the index should start with the columns you match for equality, and then end with just one column you range-match. I guess that your DATETIME column is more selective than your TINYINT column, so I guess the index should contain that column.
So, if you put an index on (platform, first_session_created_at)
it will help your query. Why?
MySQL can random-access the index to the first eligible row. In your case it's the first row with platform=1
and a date >= the start of your date range.
Then, it can do an index range scan: it can scan the index sequentially until it gets to the last eligible row. Your plan shows, in contrast, that it is scanning through the entire table to get the results, because it decided that would take less time than scanning your index that starts with your DATETIME column. In general, an index range scan much faster than a full table scan. So, you want this:
CREATE INDEX platform_date ON tbl (platform, first_session_created_at);
There's one more possible refinement here: You are doing GROUP BY on several columns. If you include those columns in the index in the same order as your GROUP BY, MySQL may be able to do the grouping operation sequentially from the index, rather than having to build an internal temporary table. This may or may not work. If it does, using temporary
won't appear in your EXPLAIN. Try adding columns to the index as follows:
CREATE INDEX platform_date ON tbl (platform,
first_session_created_at,
journey_level,
from_entity_id,
from_entity_name,
to_entity_id,
to_entity_nam );