I'm wondering why this query doesn't give me the expected (by me) result. I have a table like this, with many rows. For each city there is a unique geonameid. Each city could have it's version in different languages.
alternatenameId | geonameid | isoLanguage | alternateName | isPreferredName
1554355 5128581 en Big Apple
1614284 5128581 en New York City
9138713 5128581 en New York 1
This is the query (I limited to one city to make an example, but I'll extend to more cities):
SELECT geonameid,
alternateName,
MAX(isPreferredName)
FROM alternatename_new
WHERE geonameid=5128581
AND isoLanguage = 'en'
GROUP BY geonameid
This is the query result
geonameid | alternateName | isPreferredName
5128581 Big Apple 1
But i expect to have this
geonameid | alternateName | isPreferredName
5128581 New York 1
What I'm doing wrong?
CodePudding user response:
If you were to do the following query, which alternateName
would you expect to see in the result?
SELECT geonameid, alternateName,
MAX(isPreferredName),
MIN(isPreferredName)
FROM alternatename_new
WHERE geonameid=5128581 and isoLanguage = 'en'
GROUP BY geonameid
Should it be "Big Apple" or "New York"? That is, the alternateName that is found in the row that has the max value or min value?
What if multiple rows in the group all tied for the isPreferredName
value, but had different alternateName
values on each row? Which one should be returned?
What if you were using another aggregate function that doesn't give a value that occurs on any of the rows? E.g. COUNT() or AVG()?
The answer is that SQL aggregate functions produce a result, but the other columns in your query aren't related to the result of an aggregate function.
In MySQL, the other column alternateName
is basically arbitrary. Its value comes from one of the rows in the group, but not the row where the max was found.
In practice, MySQL happens to return the value from the first row in the group, with respect to the index order it read the rows. But this is not standard, and not guaranteed. It's just a coincidence of the way the MySQL code was written.
In SQLite, for example, the value comes from the last row in the group.
In other brands of SQL database, as well as in the SQL standard, this sort of ambiguous query is not legal. MySQL permits it unless you set a more strict SQL mode. The strict SQL mode is enabled by default in recent versions of MySQL, and this is a good thing.
You should also read my answers to:
CodePudding user response:
It seems that isPreferredName could be a boolean with 1 for true and 0 for false. If so, just add isPreferredName = 1 to the WHERE clause:
SELECT
geonameid,
alternateName,
isPreferredName
FROM
alternatename_new
WHERE
isoLanguage = 'en' AND isPreferredName = 1
CodePudding user response:
Your query don't use properly group by
in most recent version of mysql (>5.6) produce error(by default ) on other version of for only_group_by_mode seto to false produce unpredictable result
fro your expetced result you should use
select a.geonameid, b.alternateName, a.max_id
from (SELECT geonameid, MAX(isPreferredName) max_id
FROM alternatename_new
WHERE geonameid=5128581 and isoLanguage = 'en'
GROUP BY geonameid ) a
inner join alternatename_new b on a.geonameid = b.geonameid and a.max_id = b.isPreferredName