Home > Back-end >  How to use write a SQL join when there exists empty cell in the join column?
How to use write a SQL join when there exists empty cell in the join column?

Time:12-30

Schema:

CREATE TABLE PRODUCT
(
    PRODUCTID INT, 
    PRODUCTNAME VARCHAR(100), 
    PRODUCTUSER VARCHAR(100)
);

CREATE TABLE USER
(
    USERID INT, 
    USERNAME VARCHAR(100),
    USEREMAIL VARCHAR(100)
);

INSERT INTO PRODUCT(PRODUCTID,PRODUCTNAME,PRODUCTUSER) 
VALUES (1, 'Product1', 'Chen'), (2, 'Product2', 'Bob'), 
       (3, 'Product3', ''), (4, 'Product4', '');

INSERT INTO USER (USERID, USERNAME, USEREMAIL) 
VALUES (1, 'Chen', '[email protected]'),
       (2, 'Bob', '[email protected]'),
       (3, 'Paul', '[email protected]'),
       (4, '', ''), (5, '', '');

Product table:

ProductID  ProductName  ProductUser
------------------------------------
1          Product1     Chen
2          Product2     Bob
3          Product3      
4          Product4 

User table:

UserID  UserName  UserEmail
--------------------------------
1       Chen      [email protected]
2       Bob       [email protected]
3       Paul      [email protected]
4    
5

I want to join Product and User table to get all the Product Names and the User name (if available) as output. The challenge is that the common field Product.ProductUser and User.UserName both contain empty values as shown in example above. I know this is not a good schema design but I am cannot change the schema as it is out of my control.

Expected output:

PROCUTNAME   USERNAME
----------------------
Product1     Chen
Product2     Bob
Product3
Product4

Sample query 1:

SELECT PRODUCTNAME, USERNAME
FROM PRODUCT P
JOIN USER U
ON P.PRODUCTUSER=U.USERNAME

Above query is an inner join so returns duplicates for Product3 and Product4 due to the join on empty. Adding a WHERE clause ProductUser<>'' results in exclusion of Product3 and Product4.

Usually if the ProductUser were NULL instead of empty then I know that a LEFT JOIN would be the solution. I think that I understand the difference between using the criteria in the ON clause and WHERE clause of the LEFT JOIN.

So trying a left join with criteria in the ON clause:

SELECT PRODUCTNAME, USERNAME
FROM PRODUCT P
LEFT JOIN USER U
ON P.PRODUCTUSER=U.USERNAME AND P.PRODUCTUSER<>''

So the above query works by 1st doing an inner join based on the given criteria in ON clause, and then pulls in all other rows from the product table and puts a null for username. (or in other words it first lists all products, and then joins on those records that satisfy the ON criteria.)

This gives me output as expected. But I am not sure whether my approach is correct so trying another approach:

SELECT PRODUCTNAME, USERNAME
FROM PRODUCT P
LEFT JOIN (SELECT * FROM USER WHERE USERNAME<>'') U
ON P.PRODUCTUSER=U.USERNAME

This also works.

Is the left join with criteria in the ON clause a correct way to approach the problem?

CodePudding user response:

Can you just do a group by to remove the duplicates?

declare @prod TABLE (PRODUCTID INT, PRODUCTNAME VARCHAR(100), PRODUCTUSER VARCHAR(100));

declare @user table (USERID INT, USERNAME VARCHAR(100),USEREMAIL VARCHAR(100));

INSERT INTO @prod(PRODUCTID,PRODUCTNAME,PRODUCTUSER) values (1,'Product1','Chen'),
(2,'Product2','Bob'),(3,'Product3',''),(4,'Product4','');

INSERT INTO @user(USERID,USERNAME,USEREMAIL) VALUES (1,'Chen','[email protected]'),
(2,'Bob','[email protected]'),(3,'Paul','[email protected]'),
(4,'',''),(5,'','');


select * from @prod
select * from @user

select 
    p.ProductId,
    p.ProductName,
    u.USERNAME
FROM @prod p
    left join @user u on p.PRODUCTUSER = u.USERNAME
group by
    p.PRODUCTID,
    p.PRODUCTNAME,
    u.USERNAME

enter image description here

CodePudding user response:

select
    main.PRODUCTNAME
    , case
        when sub.USERNAME is null
        then ''
        else sub.USERNAME
    end USERNAME

from #PRODUCT main
    left join
    #USER sub
    on main.PRODUCTUSER = sub.USERNAME
        and sub.USERNAME like '%[A-Za-z]%'
  • Related