Home > Enterprise >  MYSQL partition by UUID column
MYSQL partition by UUID column

Time:05-27

I have table in mysql on which the need is to partition the column based on UUID(version 1 :containing the combination of timestamp MAC address).

Table which has 3 Billions rows and 6 TB data in MySQL Aurora and this is expected to grow very rapidly in future 50% increase in a year .

CREATE TABLE `org_info` (
  `ID` varchar(40) NOT NULL, UUID
  `ORGNAME` varchar(255) DEFAULT NULL,   
  `DATE_TIME` datetime(6) DEFAULT NULL,
  
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Looking to partition this table based on ID column as this is single table and will grow very fast in future.

I am looking for help as very much new to the database ,how we can partition our table based on UUID column ?

CodePudding user response:

You could try hash or key.More information in here .

And The representation of the UUID uses hex digits.You could use the last number as a key of partition.

CodePudding user response:

UUIDs, in their native format, are terrible as a key into a huge table. This because UUID values jump around a lot. When a row is looked up, the block (16KB) with that row is loaded. There may be a hundred other rows in that block (depending on row size and phase of the moon). Because of the randomness, it is quite likely that that block will be bumped out of cache before any of the other rows are used. This makes the caching largely useless. Hence, the processing is very I/O-bound.

Add in Partitioning, and nothing changes. Add in hashing of the UUIDs; well that is randomizing something that is already randomized -- no improvement. On the other hand, if you partition on date, and you can restrict the queries to less than two partition's worth of data, the partition pruning may help. (We can discuss further.)

If you have many terabytes of data but only a fraction of that in RAM, you would expect that almost every read of one row will require one I/O operation. And there are not that many IOPs available, even with SSDs, and even with RAID stripping.

Writes are just as bad -- the next UUID that comes in will have to read-modify-write some block that is probably not in cache (the "buffer_pool").

There is one situation where UUIDs can be less costly, but it involves Type-1 UUIDs and access patterns that are roughly chronological. I discuss that in UUIDs . MySQL 8.0 has incorporated some of that. MariaDB 10.7 (not yet GA) makes it into a datatype. (I do not know if Aurora has picked up any of these improvements. Fall back on my blog if it has not.)

If that use case does not apply, please describe your app and its use of UUIDs. Also, if there are multiple UUIDs, let's talk about all of them -- each UUID-based index has a similar performance problem.

One small improvement can be had by shrinking the 36-character UUID down to BINARY(16) (16 bytes). My blog explains how to do that; the versions mentioned above do the equivalent. In any case, you have change code to shrink the data.

Based on what you have said in your Question, a simple BIGINT AUTO_INCREMENT (8 bytes) might work, and be better than any UUID. Again, Partitioning is not justifiable.

  • Related