Home > other >  Mysql MAX and GROUP BY
Mysql MAX and GROUP BY

Time:11-26

I'm wondering why this query doesn't give me the expected (by me) result. I have a table like this, with many rows. For each city there is a unique geonameid. Each city could have it's version in different languages.

alternatenameId | geonameid | isoLanguage | alternateName | isPreferredName
1554355            5128581        en          Big Apple
1614284            5128581        en        New York City   
9138713            5128581        en          New York            1

This is the query (I limited to one city to make an example, but I'll extend to more cities):

SELECT geonameid, 
       alternateName, 
       MAX(isPreferredName) 
FROM alternatename_new 
WHERE geonameid=5128581 
AND isoLanguage = 'en' 
GROUP BY geonameid

This is the query result

 geonameid | alternateName | isPreferredName
  5128581     Big Apple            1

But i expect to have this

 geonameid | alternateName | isPreferredName
  5128581      New York            1

What I'm doing wrong?

CodePudding user response:

If you were to do the following query, which alternateName would you expect to see in the result?

SELECT geonameid, alternateName, 
  MAX(isPreferredName),
  MIN(isPreferredName) 
FROM alternatename_new 
WHERE geonameid=5128581 and isoLanguage = 'en' 
GROUP BY geonameid

Should it be "Big Apple" or "New York"? That is, the alternateName that is found in the row that has the max value or min value?

What if multiple rows in the group all tied for the isPreferredName value, but had different alternateName values on each row? Which one should be returned?

What if you were using another aggregate function that doesn't give a value that occurs on any of the rows? E.g. COUNT() or AVG()?

The answer is that SQL aggregate functions produce a result, but the other columns in your query aren't related to the result of an aggregate function.

In MySQL, the other column alternateName is basically arbitrary. Its value comes from one of the rows in the group, but not the row where the max was found.

In practice, MySQL happens to return the value from the first row in the group, with respect to the index order it read the rows. But this is not standard, and not guaranteed. It's just a coincidence of the way the MySQL code was written.

In SQLite, for example, the value comes from the last row in the group.

In other brands of SQL database, as well as in the SQL standard, this sort of ambiguous query is not legal. MySQL permits it unless you set a more strict SQL mode. The strict SQL mode is enabled by default in recent versions of MySQL, and this is a good thing.

You should also read my answers to:

CodePudding user response:

It seems that isPreferredName could be a boolean with 1 for true and 0 for false. If so, just add isPreferredName = 1 to the WHERE clause:

SELECT
  geonameid,
  alternateName,
  isPreferredName
FROM
  alternatename_new
WHERE
  isoLanguage = 'en' AND isPreferredName = 1

CodePudding user response:

Your query don't use properly group by
in most recent version of mysql (>5.6) produce error(by default ) on other version of for only_group_by_mode seto to false produce unpredictable result fro your expetced result you should use

    select  a.geonameid, b.alternateName, a.max_id
    from (SELECT geonameid, MAX(isPreferredName) max_id
    FROM alternatename_new 
    WHERE geonameid=5128581 and isoLanguage = 'en' 
    GROUP BY geonameid ) a 
    inner join alternatename_new b on a.geonameid = b.geonameid and a.max_id = b.isPreferredName
  • Related