The database for an application I manage use UUIDs. These UUIDs are stored as char(36)
with a utf8
character set. From a strict performance point of view, I know this is not optimal. The general recommendation seem to be to use a binary(16)
column for UUIDs. I cannot change the data type, but I can change the character set.
The characters in a UUID can be a digit 0 through 9, or letter a through f.
By changing the character set from utf8
to ascii
, the total size of all indexes for the database will probably be reduced by several gigabytes.
The application connects to the database and explicitly sets character encoding and connection collation in the connection string : characterEncoding=utf8&connectionCollation=utf8
.
What will I have to do (if anything at all) to ensure a safe "conversion" from utf8 to ascii for the UUIDs?
CodePudding user response:
Firstly, you might not get the storage space savings you are hoping for converting it to ASCII. The characters in UUIDs are all one byte in both ASCII and UTF8. I don't know the internals of mysql to say how much space it takes up. I suspect it may be like a varchar and have a few bytes to indicate the length of the string in bytes in which case you may only be saving a few bytes on each record.
If you insist on trying it, it is highly recommended that a backup of your database is taken before attempting such an endeavor. Also, it would be prudent to build a small test database to try out your changes. It is prudent to make sure that the applications that use the database still work after the change. You can change the character set and collation of a table's column using a simple SQL statement. You probably should do the same to all similar columns in the database. Changing it at the table or database level could be disasterous for your data.
You probably should also change the application so it initialises the database in the new way at setup or after a database restore.
In summary, it sounds like a high risk, low reward change to me. If there's little else on the database other than the UUIDs and you switch to binary(16) instead of ASCII, then you could get a meaningful space saving, but that would require application changes as well.
CodePudding user response:
Standard UUIDs (as opposed to home-grown ones) use only hex, which is a subset of ascii characters which is a subset of utf8. The encoding (how the bits are arranged) is identical. Hence, no data loss in the conversion.
One thing to be careful of. If you are JOINing
on a uuid, do make sure the CHARACTER SET
and COLLATION
of the column in both tables is the same. Otherwise, there will be a big performance hit. (MySQL is not smart enough to understand that you have simply hex.)
Also, check the collation -- if it is ..._ci
, then "a" == "A", etc. This would be beneficial if you might need case folding. (..._bin
treats a..f as different than A..F.)
In InnoDB, with either VARCHAR
or CHAR
, ascii or utf8mb4 with virtually any length of at least 36, works identically.
Yes, do have a backup handy, just in case.
For large tables, UUIDs have an unavoidable performance problem. I discuss it here: http://mysql.rjweb.org/doc.php/uuid