I've been studying Oracle SQL non-equi join and while there are so many different use cases, I noticed that joining tables using a range of values with non-equi join is basically the same as using WHERE clause.
Code from a great non-equi join tutorial here
SELECT r.id, r.name, h.id, h.address, h.rent, h.bedrooms
FROM renters r
JOIN houses h
ON h.district = r.preferred_district
AND h.rent BETWEEN r.min_rent AND r.max_rent
AND h.bedrooms >= r.min_bedrooms
WHERE h.id NOT IN (SELECT house_id FROM deals);
In the above example, I think AND h.rent BETWEEN r.min_rent AND r.max_rent
part and AND h.bedrooms >= r.min_bedrooms
can be placed in WHERE clause and return exactly the same result.
So what's the point of using non-equi join here?
My guess is that by applying constraints during the join operation would be more performant since the joined table would be smaller.
Is my guess correct? Is there anything I'm missing?
Thanks in advance mates.
CodePudding user response:
Functionally, when you have an inner join, there is no difference between specifying predicate conditions in the WHERE
clause or in the join conditions. Prior to Oracle adopting the SQL 99 join syntax (in 9i if memory serves), all conditions had to be specified in the WHERE
clause-- there was no JOIN
keyword.
From a performance standpoint, there should be no difference. The optimizer should produce the same query plan whether the conditions are specified in the WHERE
clause or as part of the join
. As the number of tables involved increases and the number of potential plans grows, that may not be strictly true in all cases. But you shouldn't expect consistently better or worse performance from different ways of writing the same query.
The benefit of putting the conditions in the join
clause is readability. In general, queries are easier to read when you put all the conditions that tell you how table A relates to table B together in the join
condition. If you put some of the conditions in the WHERE
clause, Oracle will give you the same result but it's likely to take the human reading your query longer to understand what results you're asking for. This is particularly true as your SQL statements start to get more complicated. If you were joining 5 or 6 tables, for example, a human reader would potentially have a dozen or so predicates in the where
clause that related to different joins tens of lines apart. Keeping all that in your head as your reading the SQL statement is a lot harder than if the join
clause contains everything you need to know to relate two tables together.
CodePudding user response:
The easiest way to get insight in such topic is to use EXPLAIN PLAN
. You get an overview of the join type and the used access and filter predicate.
Here an example of the (bit simplified) statement using part of the conditions in the where
clause
EXPLAIN PLAN SET STATEMENT_ID = 'jara2' into plan_table FOR
SELECT r.id, r.name, h.id, h.address, h.rent, h.bedrooms
FROM renters r
JOIN houses h
ON h.district = r.preferred_district
where h.rent BETWEEN r.min_rent AND r.max_rent;
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara2','ALL'));
Result - cut for brevity
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 208 | 4 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 208 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| RENTERS | 1 | 104 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| HOUSES | 1 | 104 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("H"."DISTRICT"="R"."PREFERRED_DISTRICT")
filter("H"."RENT">="R"."MIN_RENT" AND "H"."RENT"<="R"."MAX_RENT")
The exact same excution plan you'll see for the original statement with the full join condition without where
clause.
You can your guess is not correct, in both cases Oracle makes a hash join with the equal predicate (access predicate for id 1
) and then throw away the rows that does not match the filter predicate for id 1
)
So the swaping of predicates between the join consition and WHERE
clause is not relavent for non-equi joins (except for estetics, readability and correct SQL), but BTW is very relevant for OUTER JOINS
as you can get a different results.