Home > Back-end >  How to Optimise SQL Query due to Slow Performance?
How to Optimise SQL Query due to Slow Performance?

Time:06-03

I am looking for some help in regards to optimising the following query due to slow performance. The tables and records do not exist on my machine (helping a friend) hence I do not have the insert records and all I know is that the tables are large and contain millions of rows of data. I have been sent the create statements in order to show the table relationships with the foreign keys etc.

The following is the query:

SELECT
    tutor.tutor_id, 
    tutor.subject_id,
    course.title,
    SUBSTR(score.date, 4, 2) AS month,
    AVG(score.score)
FROM
    tutor
    LEFT JOIN course ON tutor.course_id = course.course_id
    LEFT JOIN score ON tutor.course_id = score.course_id
WHERE
    course.course_id IS NOT NULL
    AND score.date > '01-MAR-2020' 
    AND score.student_id in (SELECT student_id FROM students)
GROUP BY 
    tutor.tutor_id, 
    tutor.subject_id,
    course.title,
    SUBSTR(score.date, 4, 2)
    ;

Having looked through the above query, I understand that, there is a sub select query within the where clause that could be slowing the query down and this would be first step to changeover to an INNER JOIN.

I am trying to think of what else I could do, to quicken the query such as selecting the fields (which it already has). Only other thing would be to add a LIMIT to sample query results (can't run this to test myself) and also add an EXPLAIN PLAN to understand what's causing the delays in the results coming back so slow. The other option is to check if the columns in the joins have been indexed.

CREATE STATEMENTS

CREATE TABLE Score (
    score_id integer,
    student_id integer,
    course_id integer,
    date date,
    score integer,
    PRIMARY KEY(score_id),
    FOREIGN KEY(student_id) REFERENCES Students(student_id),
    FOREIGN KEY(course_id) REFERENCES Course(course_id)
);

CREATE TABLE Student (
    student_id integer,
    first_name varchar,
    last_name varchar,
    group_id integer,
    PRIMARY KEY(student_id),
    FOREIGN KEY(group_id) REFERENCES Groups(group_id)
);

CREATE TABLE Groups (
    group_id integer,
    name varchar,
    PRIMARY KEY(group_id)
);

CREATE TABLE Course (
    course_id integer,
    title varchar,
    PRIMARY KEY(course_id)
);

CREATE TABLE Tutor (
    course_id integer,
    tutor_id integer,
    group_id integer,
    PRIMARY KEY(tutor_id),
    FOREIGN KEY(course_id) REFERENCES Course(course_id),
    FOREIGN KEY(group_id) REFERENCES Groups(group_id)
);

I would appreciate it, if somebody could help me understand, if I have missed anything or if I am along the right lines in trying to optimise the query that I have.

CodePudding user response:

  • You query does not use LEFT OUTER JOIN as, in the WHERE filter, score.date > '01-MAR-2020' and course.course_id IS NOT NULL can never match NULL values so the OUTER joined rows will be eliminated.
  • Given that then WHERE course.course_id IS NOT NULL is redundant as the join clause ON tutor.course_id = course.course_id also enforces the NOT NULL condition.
  • '01-MAR-2020' is not a date; it is a string literal that looks like a date and relies on the SQL engine performing an implicit cast from string-to-date. Use DATE '2020-03-01' instead.
  • score.student_id in (SELECT student_id FROM students) is enforced by the FOREIGN KEY constraint. You don't need to query the students table, you only need to check that the value is NOT NULL.
  • SUBSTR(score.date, 4, 2) relies on an implicit date-to-string conversion. Either use EXTRACT(MONTH FROM score.date) to directly get the month number (if you want to group 2022-03 with 2020-03) or use TRUNC(score.date, 'MM') (if 2020-03 should not be grouped with 2021-03).

Which makes your query:

SELECT tutor.tutor_id, 
       tutor.subject_id,
       course.title,
       TRUNC(score.date, 'MM') AS month,
       AVG(score.score)
FROM   tutor
       INNER JOIN course ON tutor.course_id = course.course_id
       INNER JOIN score  ON tutor.course_id = score.course_id
WHERE  score.date > DATE '2020-03-01' 
AND    score.student_id IS NOT NULL
GROUP BY 
       tutor.tutor_id, 
       tutor.subject_id,
       course.title,
       TRUNC(score.date, 'MM');

As for optimisations:

You can try aggregating the score before joining the other tables:

SELECT  t.tutor_id,
        t.subject_id,
        c.title,
        s.month,
        s.avg_score
FROM    ( SELECT course_id,
                 TRUNC(date, 'MM') AS month,
                 AVG(score) AS avg_score
          FROM   score
          WHERE  date > DATE '2020-03-01'
          AND    student_id IS NOT NULL
          GROUP BY
                 course_id,
                 TRUNC(date, 'MM')
        ) s
        INNER JOIN tutor t  ON t.course_id = s.course_id
        INNER JOIN course c ON c.course_id = s.course_id

This means that you are aggregating by the primary key course_id and not by course.title. But that is probably a good thing as aggregating by title would combine two different courses that happen to have the same name; which is probably not what you want.

  • Related