Question about WHERE EXISTS and subquery on MySQL Coursera page-CodePudding

Edited again to add: the 2nd table ("dogs") have null user_guid values. If i filter out the NULL values then the NOT IN operator works without WHERE. And I still dont know why :|

Edited: Sorry I realized that I flipped the question. The question was the number of users who are in "users" table and NOT in the "dogs" table. My original code was:

select count(distinct u.user_guid)
from users u
where u.user_guid not IN (select d.user_guid
                         from dogs d)

Which got result of 0 count. The answer was supposed to be 2226 Only when I add the WHEN condition then i got the right answer:

select count(distinct u.user_guid)
from users u
where u.user_guid not IN (select d.user_guid
                         from dogs d
                         WHERE d.user_guid = u.user_guid);

I still don't know why the WHERE condition makes the difference here

The question was:

How could you determine the number of unique users in the "users" table who were also in the "dogs" table?

The correct answer was supposed to be

select count(u.user_guid)
from users u
where u.user_guid IN (select d.user_guid
                         from dogs d
                         where u.user_guid=d.user_guid);

Now I don't understand the point of the last WHERE filter. I thought the "IN" operator would have already made sure to filter only for values of u.user_guid that also exists in the d.user_guid (ie. the "IN" operator would have ensured that there are matching values) , which is the same as what the WHERE condition did (ie. u.user_guid = d.user_guid). And yet when I run the 2 codes (with and without the last WHERE line), the results were different. I'm not sure where I went wrong :/

CodePudding user response：

Below query finds the users who exists and are not unique on the dogs table , and based on the users found on the dogs table , only the unique values are returned from the users table:

select u.user_guid
from users u
where exists ( select d.user_guid
                         from dogs d
                         where u.user_guid=d.user_guid
             )
group by user_guid
having count(*)=1;

The last where condition where u.user_guid=d.user_guid returns only the users from dogs table which are found on the users table, so it should produce wrong results in your case.

If you want only the number change u.user_guid with count(u.user_guid)

select count(u.user_guid)
from users u
where exists ( select d.user_guid
                         from dogs d
                         where u.user_guid=d.user_guid
             )
group by user_guid
having count(*)=1;

CodePudding user response：

If you want to count unique users you might need distinct keyword if there are multiple rows for any user. For example: count( distinct u.user_guid).

For IN you don't need where u.user_guid=d.user_guid. But I would suggest the solution usingexists

UsingIN:

select count( u.user_guid)
from users u
where u.user_guid IN (select d.user_guid
                         from dogs d);

Using Exists:

select count( u.user_guid)
from users u
where exists IN (select 1
                         from dogs d
                          where u.user_guid=d.user_guid);

Examples of IN and exists In users table there are three value for user_guid. 1,2 and 3. 2 is there two times. But in dogs table user_guid are 2,3 and 5. 2 and 3 are common in both tables. Please check the queries and outputs.

Schema and insert statements:

 create table users(user_guid int);
 create table dogs(user_guid int);

 insert into users values(1);
 insert into users values(2);
 insert into users values(2);
 insert into users values(3);

 insert into dogs values(2);
 insert into dogs values(3);
 insert into dogs values(5);

Query: Using IN select count(distinct u.user_guid) unique_users
from users u
where u.user_guid IN (select d.user_guid from dogs d );

Output:

unique_users
2

Query: Using EXISTS

 select count(distinct u.user_guid) unique_users    
 from users u     
 where exists (select 1 from dogs d where u.user_guid = d.user_guid);

Output:

unique_users
2

db<>fiddle here

CodePudding user response：

Hey guys I think you're overcomplicating things here.
If you do a simple join you only get the values which are in both tables. If you do a left join the fields which aren't matched in the second table return null.
NB: if we join on a column which is nullable we could get false results but that's not good in any case.

create table dogs (d int);
create table user (u int);
insert into dogs values
(1),(2),(3),(4),
(1),(2),(5),(6);
insert into user values
(1),(2),(3),(4),
(1),(2),(7),(8);

SELECT COUNT(DISTINCT d) as du 
FROM dogs join user on d=u;

| du |
| -: |
|  4 |

SELECT COUNT(DISTINCT d) as dd
FROM dogs LEFT JOIN user on d=u
WHERE u IS NULL;

| dd |
| -: |
|  2 |

db<>fiddle here