I was wondering what happens to the binlog when run an alter using pt-online-schema-change
or gh-ost
?
for the pt-online-schema-change
I have read that it copies the table and use some triggers to apply the changes. I don't know if it create a table from the beginning with the new schema or it just apply the alter after copying the table?
if it alters the table from the beginning, then what happens to binglog? is the positions different than the previous binglog?
CodePudding user response:
pt-online-schema change copies the table structure and applies the desired ALTER TABLE to the zero-row table. This is virtually instantaneous. Then it creates triggers to mirror changes against the original table. Then it starts copying old data from the original table to the new table.
What happens to the binlog? It gets quite huge. The CREATE TABLE and ALTER TABLE and CREATE TRIGGER are pretty small. DDL is always statement-based in the binlog. The DML changes created by the triggers and the process of copying old data become transactions in the binlog. We prefer row-based binlogs, so these end up being pretty bulky.
gh-ost is similar, but without the triggers. gh-ost reads the binlog to find events that applied to the old table, and it applies those to the new table. Meanwhile, it also copies old data. Together these actions result in a similar volume of extra events in the binlog as occur when using pt-online-schema-change.
So you should check the amount of free disk space before you begin either of these online schema change operations. It will expand the binlogs approximately in proportion to the amount of data to be copied. And of course you need to store two copies of the whole table — the original and the altered version — temporarily, until the original table can be dropped at the end of the process.
I have had to run pt-online-schema change on large tables (500GB ) when I had a disk that was close to being full. It causes some tense moments. I had to PURGE BINARY LOGS periodically to get some more free space, because the schema change would fill the disk to 100% if I didn't! This is not a situation I recommend.