Home > front end >  Improve performance on greatest n with multiple join
Improve performance on greatest n with multiple join

Time:12-06

So I was playing some sql indexes and see if I can improve my sql performance. I'm using the data from employees table https://github.com/datacharmer/test_db, and dump the employees.sql.

After looking up the structure a bit, my question was how do I get 10 people with the highest salary and what are the title. I come up with some solution.

select e.emp_no, e.first_name, e.last_name, e.gender, salaries.salary, titles.title from employees e
inner join ( 
    select s.emp_no, s.salary from salaries as s  
    left outer join salaries as s2
    on s.emp_no = s2.emp_no 
    and s.salary < s2.salary
    where s2.emp_no is null
) salaries on salaries.emp_no = e.emp_no 
inner join (
    select t.emp_no, t.title from titles as t  
    left outer join titles as t2
    on t.emp_no = t2.emp_no 
    and t.to_date < t2.to_date 
    where t2.emp_no is null
) titles on titles.emp_no = e.emp_no
order by salaries.salary desc
limit 10;

Basicly since the employees have one-to-many relationship for titles and salaries I must group the salaries and titles by its latest greatest value, for salaries it will be salary column and for titles it will be the to_date column.

The SQL works perfect, but it was so slow ~ even after creating some index.

create index salary_emp_no_index on salaries (salary, emp_no);
create unique index emp_first_last_name_index on employees (emp_no, first_name, last_name, gender);
create index titles_emp_title on titles (emp_no, title)

What I am doing wrong here? Is there still a room for improvement?

Edit (added sqlfiddle)

http://sqlfiddle.com/#!9/72111d/1

CodePudding user response:

Two points:

First, to get the greatest salary for each emp_no and limit to 10, you don't need a subquery or join at all, you can simply do (untested):

select emp_no, max(salary) max_salary
from salaries
group by emp_no
order by max_salary desc
limit 10

Second, you only need the titles for those (up to) 10 employees, so use your max salary query as a cte or subquery and only look for titles for those employees, which should be much faster. Additionally, you can avoid the self-join on titles using the approach in https://stackoverflow.com/a/15422121/17389.

  • Related