I have recently upgraded a MySQL data store from some ungodly many-years-out-of-date version to 8.0.26.
In one particular table I store dates associated with each record, but occasionally there are as-yet-unknown future dates. These have always been stored in the format YYYY-MM-??, so the field type is VARCHAR(10) rather than DATE, as would be expected if it was possible to always be exact. The field data is otherwise reliably YYYY-MM-DD.
However, queries to order this data have recently stopped working as expected, with MySQL reckoning that such an unknown date should be ordered BEFORE an exact date.
A query boils down to something like this: SELECT * FROM table WHERE date_field <= CURDATE()
(Today is 3rd December, so CURDATE
is evaluating as 2021-12-03. The same occurs when using the literal string value 2021-12-03 rather than the CURDATE
function, so it's definitely a sorting issue rather than clash between data types.)
In those old MySQL versions previously running, 2021-12-?? would evaluate higher/greater than an exact date like 03, and thus not be returned. This would also be expected in line with ASCII sort ordering. Now, however, any such ?? records are also returned, the question mark character apparently being sorted as before/less than a digit.
For the moment I can force the correct and expected behaviour by utilising REPLACE
in my query, but this is process-heavy, ugly and inconvenient: SELECT * FROM table WHERE REPLACE(date_field , '??', '99') <= CURDATE()
Can anyone shed some light on why this is occurring and how I might correct it? It is presumably a MySQL bug, given the standard ASCII ordering and the previous experience (of many years standing) of it working correctly?
EDIT: Thanks to the initial replies pointing me to collation. The database uses almost entirely plain English with only occasional accents (etc), so I've rarely had to touch the default settings in the past.
As per ProGu and Álvaro González's responses, I've begun digging around and test queries without the real table/database involved do indeed return as suggested. However, as soon as I attempt to run anything on the real table, it's still not behaving as expected.
The table is on InnoDB, and all tables and (textual) fields across the database are utf8mb4/utf8mb4_0900_ai_ci. I have tried forcing the collation both at query level and by changing the actual table and field collation, yet that pesky 2021-12-??
is always returned, no matter which I choose. I have attempted various query formats to no avail:
SELECT * FROM table WHERE date_field <= CURDATE() ORDER BY date_field COLLATE utf8mb4_0900_ai_ci DESC
SELECT * FROM table WHERE date_field COLLATE utf8mb4_0900_ai_ci <= CURDATE() COLLATE utf8mb4_0900_ai_ci ORDER BY date_field COLLATE utf8mb4_0900_ai_ci DESC
Test based on Álvaro's code, correctly returning 2021-12-03
:
with sample_data (sample_value) as (
select '2021-12-??'
union all select '2021-12-03'
)
select *
from sample_data
where sample_value <= CURDATE()
order by sample_value COLLATE utf8mb4_0900_ai_ci DESC LIMIT 1;
Is my collation inexperience showing; have I missed something really obvious?
CodePudding user response:
This is a collation issue. You're probably relying on the default collation and that has changed.
You can change the collation at query level to figure out which ones suits your need and then adjust the table or column collation accordingly:
with sample_data (sample_value) as (
select '?'
union all select '0'
)
select *
from sample_data
order by sample_value COLLATE utf8mb4_bin;
Result |
---|
0 |
? |
with sample_data (sample_value) as (
select '?'
union all select '0'
)
select *
from sample_data
order by sample_value COLLATE utf8mb4_0900_as_cs;
Result |
---|
? |
0 |
Please note I mean collation and not encoding. You should be able to keep your current encoding if it isn't UTF-8.
CodePudding user response:
I think you will need to change the code a little. It is unclear which of the cases below apply to your code.
mysql> SELECT "2021-12-??" < "2021-12-03";
-----------------------------
| "2021-12-??" < "2021-12-03" |
-----------------------------
| 1 |
-----------------------------
mysql> SELECT "2021-12-??" < CURDATE();
ERROR 1525 (HY000): Incorrect DATE value: '2021-12-??'
mysql> SELECT "2021-12-??" < CAST(CURDATE() AS CHAR);
----------------------------------------
| "2021-12-??" < CAST(CURDATE() AS CHAR) |
----------------------------------------
| 1 |
----------------------------------------
mysql> SELECT DATE(NOW()) <= CAST(CURDATE() AS CHAR);
----------------------------------------
| DATE(NOW()) <= CAST(CURDATE() AS CHAR) |
----------------------------------------
| 1 |
----------------------------------------