Home > Software design >  LEFT JOIN with tables having boolean data producing unexpected result set
LEFT JOIN with tables having boolean data producing unexpected result set

Time:03-30

A Table_1 with only column BOOLVALUE(int) having records as

  1. 1
  2. 1
  3. 0
  4. 0
  5. 0

and another Table_2 with only column BOOLVALUE(int) having records as

  1. 1
  2. 1
  3. 1
  4. 0
  5. 0

.. I am trying to run a query

select t1.BOOLVALUE from Table_1 t1
left join Table_2 t2 on t1.BOOLVALUE=t2.BOOLVALUE

and to my surprise output is not what I expected.There are 12 rows with 6 1's and 6 0's. But doesn't this invalidates how joins work ?

CodePudding user response:

12 rows is completely expected as you have 2 rows related to 3 rows, resulting in 6 rows, and 3 rows related to 2 rows resulting in 6 rows; add these together and you get 12.

When you JOIN all related rows are JOINed based on the ON clause. Your ON clause is t1.BOOLVALUE=t2.BOOLVALUE. This means all the 1s inTable_1 relate to all the 1s in Table_2; so that's 2 rows related to 3 rows (2 * 3). Then all the 0s inTable_1 relate to all the 0s in Table_2; so that's 3 rows related to 2 rows (3 * 2). Hence (2 * 3) (3 * 2) = 6 6 = 12.

If we add an ID column to the table, this might become a little clearer.

Let's say you have 2 tables like this:

ID1 I1
1 1
2 1
3 0
4 0
5 0
ID2 I2
1 1
2 1
3 1
4 0
5 0

Then lets say you have the following query:

SELECT T1.ID1,
       T2.ID2,
       T1.I1,
       T2.I2
FROM dbo.Table1 T1
     JOIN dbo.Table2 T2 ON T1.I1 = T2.I2
ORDER BY T1.ID1
         T2.ID2;

This would result in the following data set:

ID1 ID2 I1 I2
1 1 1 1
1 2 1 1
1 3 1 1
2 1 1 1
2 2 1 1
2 3 1 1
3 4 0 0
3 5 0 0
4 4 0 0
4 5 0 0
5 4 0 0
5 5 0 0

Here you can see you have a many to many join, and where the "extra" rows are coming from.

If you LEFT JOINed on the ID and I columns, starting at Table1, you would get 5 rows, with 1 row having NULL values for ID2 and I2 (in this case because although the ID matched, I did not):

SELECT T1.ID1,
       T2.ID2,
       T1.I1,
       T2.I2
FROM dbo.Table1 T1
     LEFT JOIN dbo.Table2 T2 ON T1.ID1 = T2.ID1
                            AND T1.I1 = T2.I2
ORDER BY T1.ID1
         T2.ID2;
ID1 ID2 I1 I2
1 1 1 1
2 2 1 1
3 NULL 0 NULL
4 4 0 0
5 5 0 0

CodePudding user response:

When you join on a column of which has repeating values the number of rows returned is the product of the number of matching values in the 2 tables.
In this case there are 2 1's in table 1 and 3 in table 2 so SQL returns the 6 possible combinations (2 x 3). As there are 3 x 2 zero combinations you get 12 rows in total.
If you did a cross join you would get 25 rows back (5 x 5).

  • Related