Home > Software design >  Why don't these two SQL queries do the same thing?
Why don't these two SQL queries do the same thing?

Time:05-02

The following is from Hacker Rank SQL (Basic) Certification from this page, https://www.hackerrank.com/skills-verification

These are the instructions

This is an example of the two tables I am working with

BLOCK 1

SELECT ei.employee_ID, ei.name

FROM employee_information AS ei, Last_quarter_bonus AS lqb

WHERE ei.division = 'HR'

AND lqb.bonus >= 5000;

BLOCK 2

SELECT ei.employee_ID, ei.name

FROM employee_information AS ei, Last_quarter_bonus AS lqb

WHERE ei.division = 'HR'

AND lqb.bonus >= 5000

AND ei.employee_ID = lqb.employee_id;

END SQL QUERIES

The only difference between these two queries the final line, line 5 of BLOCK 2:

AND ei.employee_ID = lqb.employee_id;

But i dont know why that line is necessary. The first block of code should compile fine in my eyes. I don't understand what the necessity of line 5 is. It seems to me that line 3 and line 4 already make line 5 true without having to write line 5.

So why do i have to write line 5 to get the query to compile the way i want it to?
Why do these two queries spit out different results?

CodePudding user response:

Here is the BLOCK 2 query using an explicit JOIN. This is the preferred syntax as mentioned in the comments.

SELECT ei.employee_ID, ei.name
FROM employee_information AS ei JOIN Last_quarter_bonus AS lqb
    ON ei.employee_ID = lqb.employee_ID
WHERE ei.division = 'HR' AND lqb.bonus >= 5000;

Here the condition is specified in the ON clause. This condition is needed to only include rows joined where the employee is the same from employee_information and last_quarter_bounus.

CodePudding user response:

AND ei.employee_ID = lqb.employee_id;

But i dont know why that line is necessary.

This line represents the join condition. The query involves 2 tables, as you can see in the FROM clause. The comma between the table names will cause a CROSS JOIN (which is maybe not so obvious in this case, as there is just a comma).

The second query uses an INNER JOIN, which is also not obvious, as the FROM clause looks the same (table names, comma -> cross join) as in the first query. However, the condition ei.employee_ID = lqb.employee_id enforces an inner join, which will give you less rows in the result set.

In order to make the SQL code clearer, many people recommend JOIN ... ON ... syntax for inner joins, and CROSS JOIN syntax for cross joins.

Equivalents:

-- use this
select *
from ei cross join lqb ;

-- avoid
select *
from ei, lqb ;

and for inner joins...

-- use this
select *
from ei join lqb on ei.id = lqb.id ;

-- avoid
select *
from ei, lqb
where ei.id = lqb.id ;

You can see that in the case of the inner join, the WHERE clause will become cluttered with join conditions if you use just a list of tables (ie table name and commas) in the FROM clause (the more tables you join, the worse this effect becomes).

DBfiddle here.

CodePudding user response:

It is necessary for the fifth line. In the question, it is asking you to look at both tables and write a statement that will filter the output to display any employees that are in HR and make a bonus of 5000 or more.

Syntax preference aside,

When you go about doing this in block two, you have three conditions you're looking for compared to two in block one. You have:

  • the condition to look for an employee who works in hr (ei.division = 'HR')
  • the condition to look for an employee who earns a bonus of 5000 or more (lqb.bonus >= 5000) AND
  • the condition that relates the two tables together(ei.employee_ID = lqb.employee_id).

Because SQL is dumb, you have to tell it exactly what you want it to do. It can't relate the two searches without relating the two tables.

If you had all this information in one table, you'd be able to use the two conditions in block 1(without the need of a join) and get the result you're looking for but since there are two tables, you have to include it.

Now as for syntax, when you declare a join you typically assign which column is shared among both tables. So if you were to write this statement with a join, the last condition would be baked into the join command rather than being at the bottom(SELECT * FROM database1 AS db1 JOIN database2 AS db2 ON db1.commonColumn=db2.commonColumn). This would make it look like this:

SELECT ei.employee_ID, ei.name       // columns to display
FROM employee_information AS ei      // first table to look at
JOIN Last_quarter_bonus AS lqb       // second table to look at
ON ei.employee_ID = lqb.employee_id  // shared column between both tables
WHERE ei.division = 'HR'             // condition 1
AND lqb.bonus >= 5000;               // condition 2
  • Related