Here is the tutorial of GROUP_CONCAT() in GeeksForGeeks.
In "Queries 2", the Output is ascending. But there is no ORDER BY clause.
here is the picture of "Queries 2"
Could anyone can tell me why? Any help would be really appreciated!
CodePudding user response:
This is one of those oddballs where there is likely an implicit sort happening behind the scenes to optimize the DISTINCT
execution by mysql.
You can test this yourself pretty easily:
CREATE TABLE t1 (c1 VARCHAR(50));
INSERT INTO t1 VALUES ('zebra'),('giraffe'),('cattle'),('fox'),('octopus'),('yak');
SELECT GROUP_CONCAT(c1) FROM t1;
SELECT GROUP_CONCAT(DISTINCT c1) FROM t1;
GROUP_CONCAT(c1)
zebra,giraffe,cattle,fox,octopus,yak
GROUP_CONCAT(DISTINCT c1)
cattle,fox,giraffe,octopus,yak,zebra
It's not uncommon to find sorted results where no ORDER BY was specified. Window Functions output are a good example of this.
You can imagine if you were tasked, as a human, to only pick distinct items from a list. You would likely first sort the list and then pick out duplicates, right? And when you hand the list back to the person that requested this from you, you wouldn't scramble the data back up to be unsorted, I would assume. Why do the extra work? What you are seeing here is a byproduct of the optimized execution path chosen by the mysql server.
The key takeaway is "byproduct". If I specifically wanted the output of GROUP_CONCAT
to be sorted, I would specify exactly what I want and I would not rely on this implicit sorting behavior. We can't guess what the execution path will be. There are a lot of decisions an RDBMS makes when SQL is submitted to optimize the execution and depending on data size and other steps it needs to take in the sql, this behavior may work on one sql statement and not another. Likewise, it may work one day, and not another.
CodePudding user response:
TL;DR Never omit an ORDER BY clause from a query if you rely on the order for something.
Does DISTINCT will automatically sort the result in MySQL?
No. NO! Be careful!
SQL is all about sets of rows. Without ORDER BY clauses, SQL queries return the rows of their result sets in an "unpredictable" order. "Unpredictable" is like random, but worse. If the order is truly random, you have a chance to catch any ordering problem when you're testing. Unpredictable means the server returns rows in any convenient order. This means everything works as you expect until some day in the future when it doesn't, without warning. (MySQL might start using some kind of parallel algorithm in the future.)
Now it is true that DISTINCT result sets from modestly sized tables are often generated using a sorting / deduplicating algorithm in the server. But that is an implementation detail. MySql and other table servers are complex enough that relying on implementation details is not wise. The good news: If you include an ORDER BY clause showing the same order as that methodology generates, usually performance is not changed.
SQL is declarative, not procedural. We specify what we want, not how to get it. It's probably the only declarative language most of us ever see, so it's easy to make the mistake of thinking it is procedural.