I tried the following SQL query and did not understand how last part in the parenthesis
SELECT COUNT(c2.CustomerID)
FROM Customers AS c2
GROUP BY c2.Country
HAVING c2.Country <> c.Country
works.
SELECT COUNT(c.CustomerID) AS cnt, c.Country
FROM Customers c
GROUP BY c.Country
HAVING cnt NOT IN (SELECT COUNT(c2.CustomerID)
FROM Customers AS c2
GROUP BY c2.Country
HAVING c2.Country <> c.Country);
The first three lines give me country name and count of their customer id.
For example:
Argentina 3
Belgium 3
Brazil 9
IRAN 2
Turkey 2
The last part
SELECT COUNT(c2.CustomerID)
FROM Customers AS c2
GROUP BY c2.Country
HAVING c2.Country <> c.Country
only returns count customer id of country for which there is other countries with the same number of count. How does this work?
For example:
3
2
and don't give 9 for the brazil.
I don't really understand the second condition of the HAVING query works: Having c2.Country <> c.Country. I know what c2.Country <> c.Country means. I don't understand how the bold line works.
When The query in the parenthesis is linked to the original query with a "HAVING NOT IN" which essentially compares the values from the first part of the query and the values from the second (Argentina 3, Belgium 3, IRAN 2, Turkey 2) and will return all rows that are not present in the second query, which is Brazil 9.
CodePudding user response:
Given:
CREATE TABLE Customers (
CustomerID int PRIMARY KEY AUTO_INCREMENT
, Country VARCHAR(20)
);
INSERT INTO Customers (Country) VALUES
('Argentina')
, ('Argentina')
, ('Argentina')
, ('Belgium')
, ('Belgium')
, ('Belgium')
, ('Brazil')
, ('Brazil')
, ('Brazil')
, ('Brazil')
, ('Brazil')
, ('Brazil')
, ('Brazil')
, ('Brazil')
, ('Brazil')
, ('IRAN')
, ('IRAN')
, ('Turkey')
, ('Turkey')
;
and the SQL:
SELECT COUNT(c.CustomerID) as cnt, c.Country
FROM Customers c
GROUP BY c.Country
Having cnt NOT IN (SELECT COUNT(c2.CustomerID) FROM Customers AS c2 GROUP BY c2.Country Having c2.Country <> c.Country)
;
The above SQL has dependent / correlated
behavior, where the subquery has a reference to the outer row (c.Country
).
Functionally, that means we execute the subquery for each row generated by the outer GROUP BY
clause.
This also means we have to be careful to only reference the outer row with expressions that are functionally dependent on the outer GROUP BY
terms.
First, let's look at the entire result of the outer GROUP BY
logic (without the HAVING
logic):
Country | cnt |
---|---|
Argentina | 3 |
Belgium | 3 |
Brazil | 9 |
IRAN | 2 |
Turkey | 2 |
Let's take just one outer row as an example:
Country = 'IRAN', cnt = 2
When we execute the subquery for this row, we effectively do this:
(SELECT COUNT(c2.CustomerID) FROM Customers AS c2 GROUP BY c2.Country Having c2.Country <> 'IRAN')
This produces a list of (only selecting the cnt):
Country | cnt |
---|---|
Argentina | 3 |
Belgium | 3 |
Brazil | 9 |
Turkey | 2 |
But without 'IRAN' due to the HAVING
clause: Having c2.Country <> c.Country
The list of cnt
produced is: 3, 3, 9, 2
Now, since the outer row is associated with cnt = 2
, the NOT IN
clause eliminates that outer row from the final result.
The same thing happens to Turkey, Argentina and Belgium, as we process those outer rows.
The only remaining outer row is: Brazil with cnt = 9
Let me know if you need more information.
CodePudding user response:
The <> is a not equals operator which will return any countries with a customer ID count that is not equal to another country (in this instance Brazil 9). This query in the parenthesis is then linked to the original query with a "HAVING NOT IN" which essentially compares the values from the first part of the query and the values from your second (Brazil 9) and will return all rows that are not present in the second query - the one in your parenthesis. If there was another country with a customer ID count of 9 then Brazil would appear in your final results.
I do agree with @jarlh though that this query is rather odd and likely not fully optimal.
CodePudding user response:
First things first, your query should return the count first then the name of the country. SELECT COUNT(c.CustomerID) as cnt, c.Country
not the vice versa. Secondly, <>
returns the rows where two columns are not equal to each other, which in this case is the country in the outer query does not equal to the country in the inner query.