Lets say I have table, winners
:
Player | Team |
---|---|
John | Blue |
Becky | Red |
And I would like to join it to a table, points
:
Player | Team | Points |
---|---|---|
John | Blue | 8 |
John | Red | 3 |
Becky | Red | 7 |
And I would like to get the points for each winner. I know that I can do so using the following join:
SELECT
W.PLAYER
, W.TEAM
, P.POINTS
FROM WINNERS W
INNER JOIN POINTS P
ON W.PLAYER = P.PLAYER AND W.TEAM = P.TEAM
Is there any performance loss if I write the query the following way, which uses a synthetic primary key to avoid joining on multiple fields:
WITH WINNERS_NEW AS (SELECT
PLAYER || '_' || TEAM AS ID
, PLAYER
, TEAM
)
, POINTS_NEW AS (SELECT PLAYER || '_' || TEAM AS ID,
POINTS
)
SELECT
WN.PLAYER
, WN.TEAM
, PN.POINTS
FROM WINNERS_NEW WN
INNER JOIN POINTS_NEW PN
ON WN.ID = PN.ID;
The impetus behind this question is that in my actual use case, I need to repeatedly join on 7 fields, and can avoid this by creating a synthetic primary key earlier in the CTE, but I'm not sure if doing so will lead to significant performance losses.
CodePudding user response:
Yes, it can cause significant performance degradation.
If there are no indices on these columns, it's probably not an issue, but if these are indexed columns, joining on an expression instead of the fields will prevent those indices from being used, greatly reducing the query planner's options. For what it's worth, it is also possible to create an index on that concatenated expression, though it would be unusual in my experience to do so for query-conciseness reasons; supporting things like case-insensitive matching is the more common use case for expression indices.
As always, to get a definitive answer, measure it by trying it out on your own dataset .