Home > database >  SQL max is non-deterministic?
SQL max is non-deterministic?

Time:06-12

I have two tables: cities and states. States has columns for state codes and full name. Cities contains columns for population, state code, and the city name. My goal is to create a table of the city in each state with the highest population.

This is my solution which seems to work in a test, but I've been told that using max() is non-deterministic and I should use a window function instead.

SELECT
    s.name,
    c.name,
    max(c.population)
FROM cities AS c 
LEFT JOIN states AS s
ON c.state_code = s.code
GROUP BY s.name
ORDER BY s.name;

What is wrong with using max here, when would it give incorrect results?

CodePudding user response:

 create table states(code varchar(50),name varchar(50));
 create table cities(code varchar(50),name varchar(50),population int, state_code varchar(50));

 insert into states values('s01','state1');

 insert into cities values('c01','city1',100,'s01');
 insert into cities values('c02','city2',10,'s01');

Query:

 with cte as
 (
   SELECT
     s.name state_name,
     c.name city_name,
     c.population,
     row_number()over(partition by s.name order by c.population desc)rn
   FROM cities AS c 
   LEFT JOIN states AS s
   ON c.state_code = s.code
 )
 select state_name, city_name, population from cte where rn=1

Output:

state_name city_name population
state1 city1 100

db<>fiddle here

CodePudding user response:

You can find the city in each state with the max population and use it in a sub-query and join it with the tables.

Query

select s.name as state, c.name as city, c.population
from states s
join cities c
on c.state_code = s.code
join (
  select state_code, max(population) as max_pop
  from cities 
  group by state_code
) as p
on p.state_code = c.state_code 
and p.max_pop = c.population;

CodePudding user response:

In most databases your query would not even run, because you are selecting the non-aggregated column c.name without also using it in the GROUP BY clause.

For MySql, the code would run if ONLY_FULL_GROUP_BY mode is disabled, but still it would return wrong results because the query would pick a random city name out of all the cities of each state.

See the demo.

For SQLite, your query is correct!
SQLite's feature of bare columns, makes sure that the city name you get in the results is the one that has the max population.
This is non-standard, but it is documented.
The only problem here is that if there are 2 or more cities with the same max population you will get only one of them in the results.

See the demo.

  • Related