Can someone explain how the sql query works?-CodePudding

The below SQL query gets all the first_names which have different last names.

select a.first_name
from names a
WHERE a.first_name in (
select b.first_name 
  from names b
  WHERE a.last_name<>b.last_name
)

I am not able to figure out how this exactly works. What I thought would happens is, for every row in the table the subquery will check if there exists a row with a different last name. But the checking happens among only identical values of first name. Can someone explain how this actually works.

CodePudding user response：

Firstly, as I mentioned in the comments, I would recommend instead using a HAVING for this. It won't require 2 scans of the table, just one, it won't return duplicates for the same first_name and it's probably easier for you to understand:

SELECT first_name
FROM dbo.names
GROUP BY first_name
HAVING COUNT(DISTINCT last_name) > 1;

This should be fairly self explanative; it returns rows where there is more than 1 distinct value of last_name for each value of first_name.

For your query, let's just look at the WHERE:

WHERE a.first_name IN (SELECT b.first_name 
                       FROM names b
                       WHERE a.last_name<>b.last_name

Firstly, we "join" the 2 instances of names (aliased a ad b) where the value of last_name differs in the 2 instances. This will, likely, be an expensive operation; if you have a table will 1,000 rows, and it has 900 different values for last_name then you're going to end up with the join getting hundreds of matches for each row.

After that, it checks to see the value of first_name for the row in the instance of names aliased as a in the rowset from the join; it is does then there are 2 (or more) instances of same first_name with different last_name values.

Using an IN with a subquery is no different, in context, to using a literal list of values. WHERE SomeColumn IN (1,2,3,4,5,7) would be the same as WHERE SomeColumn IN (SELECT I FROM SomeTable) if the column I in the table SomeTable had the values 1, 2, 3, 4, 5, and 7.

CodePudding user response：

You can read it as an inner join like this, or an EXISTS :

SQL Fiddle

Query 2:

select a.first_name
from names a
inner join names b
on a.first_name = b.first_name 
  and a.last_name <> b.last_name
Group by a.first_name, a.last_name

Results:

| first_name |
|------------|
|       john |
|       john |
|       john |

Query 3:

select a.first_name
from names a
where exists( 
  select 1 from names b
  WHERE a.first_name = b.first_name 
    and a.last_name <> b.last_name
  )

Results:

| first_name |
|------------|
|       john |
|       john |
|       john |

CodePudding user response：

A syntax check comes first, followed by the creation of an expression tree (at which point you may additionally check for items and "line up") (i.e. fields do exist WITHIN the table). This is the initial stage; if there is a mistake, you just advise the submitter to get serious.
Then there's the analysis. A SQL query differs from a program in that it does not specify HOW to do a task; instead, it specifies WHAT THE RESULT IS. Logic based on sets. So you bring in a query analyzer (which ranges from awful to good depending on the product - Oracle has had crappy ones for a long time, DB2 has the most sensitive ones, including monitoring disc speed) to figure out how to best approach this result. This is a very intricate beast, and it may attempt dozens or hundreds of techniques before settling on the one he thinks is the quickest (cost based, basically some statistics).
Then that gets executed.