Home > other >  PG wal log analysis of the physical storage
PG wal log analysis of the physical storage

Time:12-01

PG wal log analysis of the physical storage


The original link: http://www.postgres.cn/v2/news/viewone/1/385? Tdsourcetag=s_pcqq_aiomsg


The
Transaction Log is an important part of the database, store the history of all changes in the database system and operation, to ensure that the database will not because the fault (such as power failure or other cause server crash) and loss of data, in PostgreSQL (hereinafter referred to as the PG), the transaction Log files is called the Write Ahead Log (hereinafter referred to as a WAL),

In this paper, the structure of transaction log files in the PG were briefly reviewed, including basic terminology, WAL WAL file of WAL segment internal structure and content analysis, the file XLOG Record memory organization and pg_waldump tool introduction, this is the first part, the content including WAL basic terminology, composition and WAL WAL file segment internal structure of the file,

A, a WAL basic terminology
In order to better understand a WAL and facilitate communication, it is necessary first to related WAL terms were introduced briefly,

1, REDO log

Redo log is usually known as the Redo log, before writing data file, each change will first write to the Redo log, the purpose and meaning is to store all the database change history, used for database Recovery (Recovery), Incremental Backup, Incremental Backup), PITR (Point In Time Recovery) and copy (Replication),

2, a WAL segment file

In order to facilitate management, PG divided the transaction log files into N segment, each segment is called WAL segment file, each a WAL segment the file size is 16 MB by default,

3, XLOG Record

This is a logical concept, and can be understood as with every change in the PG a XLOG Record, these XLOG Record is stored in a WAL segment in the file, PG read these XLOG fault recovery/PITR operating Record,

4, a WAL buffer

WA buffer, whether it's a WAL segment in the file header or XLOG Record will be written to the WAL buffer, in the "right time" again by writing to a WAL WAL writer segment in the file,

5, LSN

LSN namely Log Sequence Number Log Sequence Number, said XLOG record records written to the transaction Log location, LSN value for unsigned 64 - bit integer (uint64), the transaction Log, LSN monotone increasing and only,

6, checkpointer

Checkpointer is a background process of PG, the process periodically perform checkpoint, when performing the checkpoint, the process will take XLOG Record with checkpoint information written to the current WAL segment in the file, the XLOG Record Record contains the latest Redo the location of the pint,

7, checkpoint

Checkpoint checkpoint by checkpointer process execution, the main process is as follows:

Access to Redo point, structure containing the Redo point Checkpoint (detailed information please refer to the Checkpoint structure) of XLOG Record is written to the WAL segment in the file;
Refresh the Dirty Page to disk;
Information such as updates to Redo point to pg_control file,
8, REDO point

REDO point is the starting point for the PG to revive, be the last checkpoint is started at the end of the transaction log files or written to the checkpoint XLOG when Record the position of (where the position can be understood as a transaction log file offset),

9, pg_control

Pg_control is a physical file on the disk, save the basic information of the checkpoint, used in the recovery of the database, can through the command pg_controldata view the contents of the file,

Second, WAL files
As mentioned earlier, all changes in the transaction log storage database system and operating history, with the operation of the database, transaction log size growth continuously, so the transaction log size limitation? In PG, the answer is yes: size limit,

PG using 64 - bit unsigned integer (uint64) as addressing space of transaction log files, in theory, PG transaction log space up to 2 ^ 64 bytes (16 eb), how big is this size? Assume that a database is busy, every day can produce 16 TB log files, so to achieve the transaction log file size limit the time required for the material is 1024 * 1024/365 day in 2800, that is to say, although the size limit, but from the present already enough,

Obviously, for 16 exabytes of files, the OS cannot be efficient management, therefore, PG divided the transaction log files into N size of 16 m (default) a WAL segment file, its overall structure as shown in the figure below:



Figure a general structure of the transaction log

1, a WAL segment file
WAL segment the file filename called 24 characters, is composed of three parts, each part is eight characters, each character is a hexadecimal value (0 ~ F), each part of the resolution as follows (in a WAL segment file file size for 16 MB) :

Part 1 is TimeLineID, value range is 0 x00000000 - & gt; 0 XFFFFFFFF
Part 2 is the logical file ID, value range is 0 x00000000 - & gt; 0 XFFFFFFFF
Part 3 is a physical file ID, value range is 0 x00000000 - & gt; 0 x000000ff
Logical file ID, physical file ID and size, the combination of these three parts, for 64 - bit looking for space:

The logical file ID is 32 bit uint32 (unsigned int 32 bit)
Physical file ID is 8 bit unit8
16 m file size is 24 bit unit24
Of the three common unit64 (32 + 8 + 24), maximum 64 - bit file addressing space,

2, talk about LSN
XLOG LSN said the transaction log files Record records written to the transaction log files in the position, LSN understandable to XLOG Record in the transaction log file Offset (Offset),

LSN is composed of three parts, respectively is the logical file ID, physical file ID and file offset, such as LSN: 1/4288 e228, one for the logical file ID, 42 for physical file ID, 88 e228 for WAL segment file within the file offset (note: 3 bytes to find space for 16 MB),

According to this rule, given a LSN, easily according to LSN number calculated to get the corresponding log file (assuming the timeline TimeLineID to 1),

Such as: LSN 1/4288 e228 corresponding WAL segment file file for 00000001, 00000001, 00000042, the file name of the first 8 bits for a time line ID (00000001), intermediate 8 (00000001) is a logical file ID, the last eight (00000042) is a physical file ID,

In addition, PG also provides the corresponding function to LSN access log file name:

Testdb=# SELECT pg_walfile_name (' 1/4288 e228 ');
Pg_walfile_name
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
000000010000000100000042
(row 1)
Three, WAL segment internal structure of the file
WAL segment the file size is 16 MB by default, its internal structure as shown in the figure below:



Figure 2 a WAL segment internal structure of the file

1, a WAL segment file
WAL segment within the file is divided into N page (Block), each page size of 8192 Bytes is 8 k, each a WAL segment file header of the first page in PG source the corresponding data structure is XLogLongPageHeaderData, subsequent other page header is XLogPageHeaderData, corresponding data structure in a page, the page after the header is N XLOG Record,

2, XLOG Record
XLOG Record consists of two parts, the first part is XLOG Record header information, fixed size (24 Bytes), and the corresponding structure is XLogRecord; The second part is XLOG Record data,

XLOG Record overall layout is as follows:

Head data (XLogRecord structure of fixed size)
XLogRecordBlockHeader structure
XLogRecordBlockHeader structure
.
XLogRecordDataHeader [Short | Long] structure
Block data
Block data
.
The main data
XLOG Record according to the data stored content to differentiate, can be divided into three categories:

Record for backup block: storage full - write - page of the block, this type of Record is in order to solve the problem of page section to write, after completion of the checkpoint to modify data page for the first time, the change from the records written to the transaction log file is full page write (need to set up the corresponding initialization parameter, the default for open);
Record for a tuple data block: stored in a tuple page changes, use this type of Record Record;
Record for Checkpoint: at the Checkpoint occurs, Checkpoint Record information in the transaction log files (including Redo point),
The XLOG Record data is the place where to store the actual data, consists of the following parts:

0.. N XLogRecordBlockHeader, each XLogRecordBlockHeader corresponding to one block data;
XLogRecordDataHeader [Short | Long], such as data size & lt; 256 Bytes, then use the Short format, otherwise use Long format;
Block data: full - write - page data and tuple data, for full - write - page data, such as to enable the compression, the stored data compression, compression after the page related metadata stored in the XLogRecordBlockCompressHeader;
The main data:/checkpoint log data.
INSERT data, for example, in the XLOG when inserting data Record data structure as shown in the figure below:



FIG. 3 XLOG Record data for DML Statement

nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
  • Related