how to reduce amount of WAL files generated in postgresql-CodePudding

Huge pile of WAL files are generated in Master-Standby replication. walfiles are archived at one of the standby node and every 2 hour, we are using tar to compress the archived WALs in standby node. Still, it becomes a huge size to store. When it comes to 30, 90 days' backup it becomes a huge storage issue. Also, ends up taking more time to download and replay the WAL's during restoration.

I have used the below options.

wal_level=replica
wal_compression=on
archive_mode = always

And below parameters are commented/not used.

archive_timeout
checkpoint_timeout

Is there any other way, we can reduce the number of WAL's generated or an easier way to manage them? pg_waldump is showing around 70-90% of the data is full page images.

Also, Can I make above parameters in effect by changing in standby node? Is standby archiving the same WAL's sent by the master? OR is it regenerating based on standby's configuration?

-- Update: Modified to below values

        name        | setting | unit
-------------------- --------- ------
 archive_timeout    | 0       | s
 checkpoint_timeout | 3600    | s
 checkpoint_warning | 3600    | s
 max_wal_size       | 4000    | MB
 min_wal_size       | 2000    | MB
 shared_buffers     | 458752  | 8kB
 wal_buffers        | 4096    | 8kB
 wal_compression    | on      |
 wal_level          | replica |

still seeing 3-4 WAL files generated every minute. I am making these changes on hot standby node(From where backup is taken). Should I change this in Master? Does master settings have affect on Standby's WAL generation?

Example pg_waldump showing FPI size=87%

pg_waldump --stats 0000000100000498000000B2
Type                                           N      (%)          Record size      (%)             FPI size      (%)        Combined size      (%)
----                                           -      ---          -----------      ---             --------      ---        -------------      ---
XLOG                                           1 (  0.00)                  114 (  0.01)                    0 (  0.00)                  114 (  0.00)
Transaction                                 3070 ( 10.35)               104380 (  4.86)                    0 (  0.00)               104380 (  0.63)
Storage                                        0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
CLOG                                           0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Database                                       0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Tablespace                                     0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
MultiXact                                      0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
RelMap                                         0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Standby                                        2 (  0.01)                  100 (  0.00)                    0 (  0.00)                  100 (  0.00)
Heap2                                        590 (  1.99)                33863 (  1.58)                46192 (  0.32)                80055 (  0.48)
Heap                                        6679 ( 22.51)               578232 ( 26.92)              4482508 ( 30.92)              5060740 ( 30.41)
Btree                                      19330 ( 65.14)              1430918 ( 66.62)              9967524 ( 68.76)             11398442 ( 68.48)
Hash                                           0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Gin                                            0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Gist                                           0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Sequence                                       0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
SPGist                                         0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
BRIN                                           0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
CommitTs                                       4 (  0.01)                  120 (  0.01)                    0 (  0.00)                  120 (  0.00)
ReplicationOrigin                              0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Generic                                        0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
LogicalMessage                                 0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
                                        --------                      --------                      --------                      --------
Total                                      29676                       2147727 [12.90%]             14496224 [87.10%]             16643951 [100%]

CodePudding user response：

wal_compression=on

This may be counter-productive. This type of compression needs to compress each WAL record in isolation, without the larger context. So this is not very effective. However, when you then recompress whole WAL files offline where they do have access to the larger context, the first round of attempted compression interferes with the better-situated compression attempt.

For example, if I take the WAL from 1,000,000 pgbench transactions, they occupy 889192448 raw bytes without wal_compression, and 637534208 with it.

But then after passing them through 'xz' (a very slow but very thorough compressor), the first set takes 129393020 bytes but the 2nd one takes 155769400. So turning on compression too soon cost me 20% more space.

You could use pg_waldump --stat ... on some WAL files to see what is actually in them. If it is mostly FPI, then you could try to make the checkpoints further apart to reduce the FPI frequency. But if you don't have much FPI to start with, that would be ineffective. If you can isolate what is causing so much WAL maybe you can do something about it. For example if you do a lot of degenerate updates where a column is set to the same value it already had, adding a WHERE to suppress those cases could spare you a lot of WAL generation.

CodePudding user response：

WALs being generated are a reflection of your primary machine activity. Increasing checkpoint_timeout will help reduce your overall machine activity making it easier to process the WAL logs.

Standby Archiving is the processing the logs as sent by the Primary. They are binary identical. Is it a cold standby or are you processing logs on the standby as they are sent?

CodePudding user response：

Since a high percentage of your WAL consists of full page images, you can reduce the amount of WAL considerably by having checkpoints less often. A full page image is written to WAL whenever a page becomes dirty for the first time after a checkpoint. The price you have to pay is a longer crash recovery time.

To reduce the rate of checkpoints, change these parameters:

checkpoint_timeout (default 5 minutes): set it to something high like 1 hour
max_wal_size (default 1GB): set it higher than the amount of WAL that is written withing one hour to match the checkpoint_timeout setting

These settings have to be made on the primary server, where WAL is generated, not on the standby. Best practice is to use the same settings on both servers.