How to put nested query in a JOIN?-CodePudding

Let's say that I have the following two tables:

TABLE1

 ------- ------- ------- 
| data1 | data2 | data3 |
 ------- ------- ------- 
|     1 |    12 |    13 |
|     2 |    22 |    23 |
|     3 |    32 |    33 |
 ------- ------- -------

TABLE2

 ------- ------- ------- 
| data1 | data4 | data5 |
 ------- ------- ------- 
|     1 |  NULL |   015 |
|     1 |    14 |   115 |
|     1 |    14 |   115 |
|     2 |  NULL |   025 |
|     2 |    24 |   125 |
|     2 |    24 |   125 |
|     3 |  NULL |   035 |
|     3 |    34 |   135 |
|     3 |    34 |   135 |
 ------- ------- -------

And I have the following query:

SELECT TABLE1.data1,
       TABLE1.data2,
       TABLE1.data3,
       (SELECT TOP 1
               data4
        FROM TABLE2
        WHERE data1 = TABLE1.data1
          AND data4 IS NOT NULL),
       (SELECT TOP 1
               data5
        FROM TABLE2
        WHERE data1 = TABLE1.data1
          AND data4 IS NOT NULL)
FROM TABLE1;

QUERY RESULT

 ------- ------- ------- ------- ------- 
| data1 | data2 | data3 | data4 | data5 |
 ------- ------- ------- ------- ------- 
|     1 |    12 |    13 |    14 |   115 |
|     2 |    22 |    23 |    24 |   125 |
|     3 |    32 |    33 |    34 |   135 |
 ------- ------- ------- ------- -------

Assuming the TABLE2 meets these two conditions:

Foreach data1, data4 can either be 1 or have the same value in every row.
Foreach data1, data5 will have one value for each row with data4 null and another for each row with data4 not null.

Is there a way to rewrite the query in such a way that I don't have a nested query in the select part? Maybe using JOIN statements? I'm asking because I've realized that the performance of the nested query in the SELECT is quite poor. However, if I try with a JOIN I end up duplicating the rows that have data4 different than null.

CodePudding user response：

You can use OUTER APPLY or CROSS APPLY

SELECT TABLE1.data1,
       TABLE1.data2,
       TABLE1.data3,
       t2.data4,
       t2.data5
FROM TABLE1
OUTER APPLY (SELECT TOP 1
               data4,
               data5
        FROM TABLE2 t2
        WHERE t2.data1 = TABLE1.data1
          AND t2.data4 IS NOT NULL
        ORDER BY t2.SomeColumn
-- TOP should have an ORDER BY otherwise results are not guaranteed
) t2;

CodePudding user response：

I notice that in your table2, except for NULLs in data4, the rows do not differ. So a SELECT DISTINCT is easy to code, and, albeit resource intensive, as it is a GROUP BY *, in essence, good enough for this example. And should you have differences, the result tables will suddenly have duplicates that you did not expect, and that will guide you to further data investigations.

That said, here you go:

WITH                                                                                                                                                                                                
-- your input ..
tb1(data1,data2,data3) AS (
          SELECT 1,12,13
UNION ALL SELECT 2,22,23
UNION ALL SELECT 3,32,33
)
,
tb2(data1,data4,data5) AS (
          SELECT 1,NULL,015
UNION ALL SELECT 1,14,115
UNION ALL SELECT 1,14,115
UNION ALL SELECT 2,NULL,025
UNION ALL SELECT 2,24,125
UNION ALL SELECT 2,24,125
UNION ALL SELECT 3,NULL,035
UNION ALL SELECT 3,34,135
UNION ALL SELECT 3,34,135
)
-- end of your input.
-- Real Query starts here; replace following comma with "WITH" ..
,
tb2grp AS (
  SELECT DISTINCT
    *
  FROM tb2
  WHERE data4 IS NOT NULL
  -- chk  data1 | data4 | data5 
  -- chk ------- ------- -------
  -- chk      1 |    14 |   115
  -- chk      2 |    24 |   125
  -- chk      3 |    34 |   135
)
SELECT
  tb1.data1
, tb1.data2
, tb1.data3
, tb2.data4
, tb2.data5
FROM tb1 JOIN tb2grp AS tb2 USING(data1)
ORDER BY data1;
-- out  data1 | data2 | data3 | data4 | data5 
-- out ------- ------- ------- ------- -------
-- out      1 |    12 |    13 |    14 |   115
-- out      2 |    22 |    23 |    24 |   125
-- out      3 |    32 |    33 |    34 |   135