Home > other >  MySQL COUNT(Joining table) is taking too long to fetch the data inside large tables
MySQL COUNT(Joining table) is taking too long to fetch the data inside large tables

Time:11-17

Description:

We have two tables as below:

  1. table_1 ("question" main table)
  2. table_2 ("question_attempted" joining table)

Cases:

  1. In "table_2" we have a column that has a column "is_correct" (holds 1,0) for right or wrong answers.
  2. In "table_1" we have 1 m records and in "table_2" we have 10m records

We want to sort our listing data by below columns/values:

  1. Total number of times questions were attempted
  2. Total number of times questions were answered correctly
  3. The percentages questions were answered correctly (based on above two values)

Issue:

As soon as we join the table_1 and table_2 to get the count of total_questions_attempted, total_questiones_give_correct_answer, perntage_corrected_given_answers. The query starts taking around 8-10 minutes to run. Table structures are given below. Thanks in advance.

Table structures:

CREATE TABLE IF NOT EXISTS `question` (
  `id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
  `category` bigint(20) NOT NULL DEFAULT 0,
  `parent` bigint(20) UNSIGNED NOT NULL DEFAULT 0,
  `name` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `questiontext` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `questiontextformat` tinyint(4) NOT NULL DEFAULT 0,
  `generalfeedback` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `generalfeedbackformat` tinyint(4) NOT NULL DEFAULT 0,
  `defaultmark` decimal(12,7) NOT NULL DEFAULT 1.0000000,
  `penalty` decimal(12,7) NOT NULL DEFAULT 0.3333333,
  `qtype` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '' COMMENT '''1''',
  `length` bigint(20) UNSIGNED NOT NULL DEFAULT 1,
  `stamp` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `version` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `hidden` tinyint(3) UNSIGNED NOT NULL DEFAULT 0,
  `timecreated` bigint(20) UNSIGNED NOT NULL DEFAULT 0,
  `timemodified` bigint(20) UNSIGNED NOT NULL DEFAULT 0,
  `createdby` bigint(20) UNSIGNED DEFAULT NULL,
  `modifiedby` bigint(20) UNSIGNED DEFAULT NULL,
  `type_data_id` bigint(20) NOT NULL,
  `img_id` bigint(20) DEFAULT NULL,
  `qimg_gallary_text` text COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `qrimg_gallary_text` text COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `qimg_gallary_ids` text COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `qrimg_gallary_ids` text COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `case_id` bigint(20) NOT NULL DEFAULT 0,
  `ques_type_id` bigint(20) DEFAULT NULL,
  `year` bigint(20) DEFAULT NULL,
  `spec` bigint(20) DEFAULT NULL,
  `sub_speciality_id` int(11) DEFAULT NULL,
  `sub_sub_speciality_id` int(11) DEFAULT NULL,
  `spec_level` bigint(20) DEFAULT 1,
  `is_deleted` int(11) NOT NULL DEFAULT 0,
  `sequence` int(11) NOT NULL DEFAULT 0,
  `sort_order` bigint(20) NOT NULL DEFAULT 0 COMMENT 'Question order in list',
  `idnumber` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `addendum` text COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `text_for_search` longtext COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'this is for the text based searching, this will store the text of the question without html tags',
  `text_for_search_ans` longtext COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `type_data_id` (`type_data_id`),
  UNIQUE KEY `mdl_ques_catidn_uix` (`category`,`idnumber`),
  KEY `mdl_ques_cat_ix` (`category`),
  KEY `mdl_ques_par_ix` (`parent`),
  KEY `mdl_ques_cre_ix` (`createdby`),
  KEY `mdl_ques_mod_ix` (`modifiedby`),
  KEY `id` (`id`),
  KEY `mq_spec_ix` (`spec`),
  KEY `sort_order` (`sort_order`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='The questions themselves';
CREATE TABLE IF NOT EXISTS `question_attempted` (
  `id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
  `questionusageid` bigint(20) UNSIGNED NOT NULL,
  `slot` bigint(20) UNSIGNED NOT NULL,
  `behaviour` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `questionid` bigint(20) UNSIGNED NOT NULL,
  `variant` bigint(20) UNSIGNED NOT NULL DEFAULT 1,
  `maxmark` decimal(12,7) NOT NULL,
  `minfraction` decimal(12,7) NOT NULL,
  `flagged` tinyint(3) UNSIGNED NOT NULL DEFAULT 2,
  `questionsummary` text COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `rightanswer` text COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `responsesummary` text COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `timemodified` bigint(20) UNSIGNED NOT NULL,
  `maxfraction` decimal(12,7) DEFAULT 1.0000000,
  `in_remind_state` int(11) NOT NULL DEFAULT 0,
  `is_correct` tinyint(1) DEFAULT 1,
  PRIMARY KEY (`id`),
  UNIQUE KEY `mdl_quesatte_queslo_uix` (`questionusageid`,`slot`),
  KEY `mdl_quesatte_que_ix` (`questionid`),
  KEY `mdl_quesatte_que2_ix` (`questionusageid`),
  KEY `mdl_quesatte_beh_ix` (`behaviour`),
  KEY `questionid` (`questionid`),
  KEY `is_correct` (`is_correct`)
) ENGINE=InnoDB AUTO_INCREMENT=151176 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='Each row here corresponds to  an attempt at one question, as ';   

I tried with the below query:

SELECT mq.id, mq.name, COUNT(is_correct) 
FROM mdl_question_attempts as mqa 
LEFT JOIN mdl_question mq on mq.id = mqa.questionid where mq.id IS NOT NULL and mq.is_deleted = '0' 
GROUP by mqa.questionid 
ORDER by mq.sort_order desc, mq.id DESC 
LIMIT 50

https://i.stack.imgur.com/mHK6W.png

CodePudding user response:

The correct query is

SELECT mq.id, mq.name, COUNT(mqa.questionid)
FROM mdl_question mq
LEFT JOIN mdl_question_attempts mqa ON mq.id = mqa.questionid AND mqa.is_correct
WHERE NOT mq.is_deleted
GROUP by mq.id
ORDER by mq.sort_order DESC, mq.id DESC 
LIMIT 50;

Now let's see, how fast this can get. There is just one criteria on the question table (WHERE NOT mq.is_deleted). We can probably assume that very many if not most questions are not deleted, so using an index here makes no sense on first glance; reading the full table seems quicker.

Then we outer join the answers on the question ID and the is_correct flag. This means we should at least have an index on the ID, better even on the ID and the flag:

CREATE INDEX idx1 ON mdl_question_attempts (questionid, is_correct);

Now we must order all rows by the question's sort_order and ID to get the first 50 rows. It would be great to have an index that is already sorted, so we could just take the first 50 entries from there. But then, we are only looking at rows matching NOT mq.is_deleted, so the index must include that flag:

CREATE INDEX idx2 ON mdl_question (is_deleted, sort_order DESC, id DESC);

We could even include the name, so all data is available from the index and the table must not be read anymore (covering index).

CREATE INDEX idx2 ON mdl_question (is_deleted, sort_order DESC, id DESC, name);

It is still up to the DBMS to use these indexes or not. We are just providing them to give the DBMS the option. With this query it depends on how well MySQL's optimizer works. Does it see that it can just read the 50 first entries from the question index and then use the answer index for the simple counting?

CodePudding user response:

add index in your table

CREATE INDEX index_name ON table_name (column_name);

references : https://www.w3schools.com/sql/sql_create_index.asp

  • Related