Home > Software engineering >  Why are my outputs "duplicates" or messed up when I run this query of multiple joins / par
Why are my outputs "duplicates" or messed up when I run this query of multiple joins / par

Time:02-23

SELECT employees$.emp_no, employees$.first_name, employees$.last_name, salaries$.salary, departments$.dept_name
FROM employees$, dept_emp$, departments$, salaries$
WHERE employees$.emp_no = dept_emp$.emp_no AND
dept_emp$.dept_no = departments$.dept_no AND
employees$.emp_no = salaries$.emp_no;

I also have tried

SELECT employees$.emp_no, employees$.first_name, employees$.last_name, salaries$.salary, dept_emp$.dept_no, departments$.dept_name
FROM employees$
join dept_emp$
on employees$.emp_no = dept_emp$.emp_no
join salaries$
on employees$.emp_no = dept_emp$.emp_no
join departments$
on dept_emp$.dept_no = departments$.dept_no

I'm supposed to have 400 rows, first query outputs 16000, second one 442. I'm not sure what is wrong, I have been stuck at this for days. I assume I would have to join 4 tables and from there do a > query. I have not got to the second part yet as my joins are having issues. I would appreciate any help, full question posted below. (after posting realized I may need dept_manager table as well, anyone can help by posting the full solution will be much appreciated, thank you)

Full question: "List all employees where their salary exceeds the salary of their manager."

ERD erd

output

output

CodePudding user response:

Since a number of your tables will have multiple records per employee with different valid periods (from_date - to_date, you need to filter these to the relevant period, which based on your question is today, so to get active salaries you'd use something like:

SELECT  *
FROM    salaries$ AS s
WHERE   s.from_date <= CURRENT_TIMESTAMP
AND     (s.to_date > CURRENT_TIMESTAMP OR s.to_date IS NULL);

You also need to apply the same logic to dept_emp and dept_manager.

As an aside, SQL Server's Temporal Tables are a great way of managing this, and make it so that there is just one record in the main table so you don't need to worry about these filters, but you can also query historic data at a given point in time very easily, e.g. SELECT * FROM Salaries FOR SYSTEM_TIME AS OF '20220201'. Appreciate that this is a homework question, so none of this is relevant right now, but may be of use in the future.

So, back to the original point, you need to add these filters into your query, which you can do as so:

SELECT  e.emp_no, e.first_name, e.last_name, s.salary, d.dept_no, d.dept_name
FROM    employees$ AS e
        JOIN dept_emp$ AS de
            ON de.emp_no = e.emp_no
            AND de.from_date <= CURRENT_TIMESTAMP
            AND (de.to_date > CURRENT_TIMESTAMP OR de.to_date IS NULL)
        JOIN salaries$ AS s
            ON s.emp_no = e.emp_no
            AND s.from_date <= CURRENT_TIMESTAMP
            AND (s.to_date > CURRENT_TIMESTAMP OR s.to_date IS NULL)
        JOIN departments$ AS d
            ON d.dept_no = de.dept_no;

*N.B I have used aliases here as this (in my opinion) makes queries much easier to read and write

The above should cover your basic scenario to get all employees and their salary. You then just need a couple more joins to get the managers salary, then you can apply your filter to check for employees with a higher salary than their manager:

SELECT  e.emp_no, e.first_name, e.last_name, s.salary, d.dept_no, d.dept_name
FROM    employees$ AS e
        JOIN dept_emp$ AS de
            ON de.emp_no = e.emp_no
            AND de.from_date <= CURRENT_TIMESTAMP
            AND (de.to_date > CURRENT_TIMESTAMP OR de.to_date IS NULL)
        JOIN salaries$ AS s
            ON s.emp_no = e.emp_no
            AND s.from_date <= CURRENT_TIMESTAMP
            AND (s.to_date > CURRENT_TIMESTAMP OR s.to_date IS NULL)
        JOIN departments$ AS d
            ON d.dept_no = de.dept_no
        JOIN dept_manager$ AS dm
            ON dm.dept_no = d.dept_no
            AND dm.from_date <= CURRENT_TIMESTAMP
            AND (dm.to_date > CURRENT_TIMESTAMP OR dm.to_date IS NULL)
        JOIN salaries$ AS ms -- Manager Salary
            ON ms.emp_no = dm.emp_no
            AND ms.from_date <= CURRENT_TIMESTAMP
            AND (ms.to_date > CURRENT_TIMESTAMP OR ms.to_date IS NULL)
WHERE   s.salary > ms.salary;

If this were my database I would be inclined to create a function to get records active of a particular date, e.g.

CREATE FUNCTION dbo.SalariesAsOf(@Date DATE)
RETURNS TABLE
AS
RETURN
(   SELECT  emp_no, salary
    FROM    Salaries$
    WHERE   s.from_date <= @Date
    AND     (s.to_date > @Date OR s.to_date IS NULL)
);

And repeat this for all the tables of this nature, which encapsulates this logic and makes it easier to re-use. This would change the final query to something like:

SELECT  e.emp_no, e.first_name, e.last_name, s.salary, d.dept_no, d.dept_name
FROM    employees$ AS e
        JOIN dbo.DeptEmpAsOf(CURRENT_TIMESTAMP) AS de
            ON de.emp_no = e.emp_no
        JOIN dbo.SalariesAsOf(CURRENT_TIMESTAMP) AS s
            ON s.emp_no = e.emp_no
        JOIN departments$ AS d
            ON d.dept_no = de.dept_no
        JOIN dbo.DeptManagerAsOf(CURRENT_TIMESTAMP) AS dm
            ON dm.dept_no = d.dept_no
        JOIN dbo.SalariesAsOf(CURRENT_TIMESTAMP) AS ms -- Manager Salary
            ON ms.emp_no = dm.emp_no
WHERE   s.salary > ms.salary;

Again, I think this is likely well out of scope for the actual homework question, but it is what I would do, so thought I would add my 2p worth to the end of the question.

  • Related