Home > Enterprise >  JOIN - why these queries provide different output?
JOIN - why these queries provide different output?

Time:11-04

I have used a sample database to test out two queries. I expected them to have the same output, but the output is different and I can't understand why.

Could anyone please explain to me:

  1. Why ver 3 provides different output than ver 2 ?
  2. What is it about rows 10 24 33 41 50 59 67 73 100 132 139 147 162 169 that caused them to be excluded from ver 3 output ? I can't find any differentiating factor from all other rows that were included in the output.

Ver 2 code

SELECT 
    de.emp_no, de.from_date, de.to_date
FROM
    dept_emp_dup de
        JOIN
    (SELECT 
        emp_no, MAX(from_date) AS from_date, to_date
    FROM
        dept_emp_dup
    WHERE
        to_date > SYSDATE()
    GROUP BY emp_no) de1 ON de1.emp_no = de.emp_no
WHERE
    de1.to_date = de.to_date;

Ver 3 code

SELECT 
    de.emp_no, de.from_date, de.to_date
FROM
    dept_emp_dup de
        JOIN
    (SELECT 
        emp_no, MAX(from_date) AS from_date, to_date
    FROM
        dept_emp_dup
    GROUP BY emp_no) de1 ON de1.emp_no = de.emp_no
WHERE
    de1.to_date > SYSDATE()
        AND de1.to_date = de.to_date;

The excel file in this LINK below contains sheets with

  1. the full data
  2. the correct output as filtered in excel
  3. ver 2 query output
  4. ver 3 query output

CodePudding user response:

This is a comment that doesn't fit in the comments section. Don't upvote it.

The subquery in the first query is not well formed, and unfortunately MySQL accepts it and runs it. This is the code you have:

SELECT 
  emp_no, MAX(from_date) AS from_date, to_date
FROM
  dept_emp_dup
WHERE
  to_date > SYSDATE()
GROUP BY emp_no

All columns not included in the GROUP BY clause should be aggregated in the select list. The offending column is to_date. In short, it should me modified either to:

SELECT emp_no, MAX(from_date) AS from_date, to_date
FROM dept_emp_dup
WHERE to_date > SYSDATE()
GROUP BY emp_no, to_date -- added to the GROUP BY clause

or to:

SELECT emp_no, MAX(from_date) AS from_date, 
  MAX(to_date) as to_date -- aggregated with MAX()
FROM dept_emp_dup
WHERE to_date > SYSDATE()
GROUP BY emp_no

CodePudding user response:

I would check that the date format is saved exactly the same. If its a different OS or even a different device the date format night be different. Even within the spreadsheet the date format for the column might be different. You could cast each date into a single format in your app. So you read the date in and format the date then proceed with the processing.

  • Related