I need to complete a full outer join using two columns: date_local
and hour_local
but am concerned about query performance when using an outer join as opposed to another type of join.
Below is the query using outer join:
SELECT *
FROM TABLE_A
FULL OUTER JOIN TABLE_B
USING (DATE_LOCAL, HOUR_LOCAL)
Would the following query perform better than the query above:
WITH JOIN_VALS AS
(SELECT DATE_LOCAL
, HOUR_LOCAL
FRΩM TABLE_A
UNION
SELECT
DATE_LOCAL
, HOUR_LOCAL
FROM TABLE_B
)
SELECT
JV.DATE
, JV.HOUR_LOCAL
, TA.PLANNED
, TB.ACTUAL
FROM JOIN_VALS JV
LEFT JOIN TABLE_A TA
ON JV.DATE = TA.DATE
AND JV.HOUR_LOCAL = TA.HOUR_LOCAL
LEFT JOIN TABLE_B TB
ON JV.DATE = TB.DATE
AND JV.HOUR_LOCAL = TB.HOUR_LOCAL;
Wondering if I get any performance improvements but isolating the unique join values first, rather than finding them during the outer join.
CodePudding user response:
UNION can be expensive and I don’t think you will seen any benefit from this construct in Redshift. Likely performance loss. Redshift is a columnar database and will see no benefit from peeling off these columns.
The big cost will be if the matches between the two tables on these two columns will be many-to-many. This would lead to additional row creation which could lead to slow performance.