Home > Software engineering >  Why is this NOT NULL to NULL migration triggering lots of I/O operations?
Why is this NOT NULL to NULL migration triggering lots of I/O operations?

Time:07-05

I have this MySQL table, with a size of about 120GB:

CREATE TABLE `impressions` (
  `session_uuid` varchar(36) DEFAULT NULL,
  `survey_uuid` varchar(255) NOT NULL,
  `data` text,
  `created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `user_uuid` varchar(255) NOT NULL DEFAULT '',
  `is_test` tinyint(1) NOT NULL DEFAULT '0',
  KEY `impressions_survey_uuid_session_uuid_user_uuid_index` (`survey_uuid`,`session_uuid`,`user_uuid`),
  KEY `impressions_created_at_index` (`created_at`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

I see that this data migration is taking more than 6 hours (on a decent RDS instance where I was able to run more complex migrations) because is doing lots of I/O operations. Why does it have to do so many operations? The only thing I'm changing here is the NULL optionality and the default value.

ALTER TABLE `impressions` CHANGE COLUMN `user_uuid` `user_uuid` VARCHAR(255) null;

CodePudding user response:

Changing the nullability of a column changes the structure of the row in InnoDB. A row stored in InnoDB has a bitmap for each nullable column, to indicate whether that column is in fact NULL on the given row. If you change the nullability of a column, you have changed the length of the bitmap. Therefore every row must be rewritten to a new set of pages.

Changing only the DEFAULT is a metadata-only change.

I've made the mistake of running an ALTER TABLE that should've been a metadata-only change, but I forgot to match the nullability of the original column, and so my ALTER TABLE became a table restructure and took a long time.

If you have to do such changes in MySQL, I suggest you look at one of the open-source online schema change tools: pt-online-schema-change or gh-ost. I've used the former tool to manage many long-running schema changes in production. It usually makes the operation take a little bit longer, but that's not a problem because the table can still be used for both reading and writing while the schema change is in progress.

  • Related