Retrieve LSN of Postgres database during transaction-CodePudding

In Postgres, I am taking out a REPEATABLE READ transaction in order to obtain a consistent view of the database at the moment the transaction is begun. I'd like to know the LSN from the POV of this transaction so that I can also setup a replication slot at this LSN at the same tie so that once I finish with the transaction I can setup logical replication at the LSN and receive all updates to the database that happened after the transaction begun.

My expectation was that the LSN wouldn't change once inside the transaction (as other connections were making updates, etc) however multiple calls pg_current_wal_lsn in the transaction resulted in a different LSN each time.

Is there a way to determine the last LSN as seen from the POV of the transaction?

For a bit more context, I'd like to setup logical replication on a database but must first operate on the data that exists in the database prior to setting up the replication slot. I must assume that previous WAL segments have been purged so I don't expect to see all data in the database via logical replication and as a result need a way to first operate on existing data then stream everything past that. Hopefully that makes sense.

Thanks in advance.

CodePudding user response：

What is fixed for a REPEATABLE READ transaction is not the WAL position, since such a transaction can perform data modifications. It is the snapshot, which determines which row versions it can see.

A snapshot consists of a minimal transaction ID (the transaction can see any rows created by older transactions), a maximal transaction ID (the transaction can see nothing newer than that) and the list of the IDs of all concurrent transactions.

Now if you have a REPEATABLE READ READ ONLY transaction, it makes sense to ask for the position where WAL is inserted at the time the snapshot is taken, so you could query

SELECT pg_current_wal_insert_lsn();

as the first statement in your transaction.

However, there is a race condition: first PostgreSQL takes the snapshot, then it executes the query. Between those times a concurrent transaction could perform data modifications that are not visible to the snapshot, but before the LSN you get from the function.

The solution is to use logical decoding. As the documentation says:

When a new replication slot is created using the streaming replication interface (see CREATE_REPLICATION_SLOT), a snapshot is exported (see Section 9.27.5), which will show exactly the state of the database after which all changes will be included in the change stream. This can be used to create a new replica by using SET TRANSACTION SNAPSHOT to read the state of the database at the moment the slot was created. This transaction can then be used to dump the database's state at that point in time, which afterwards can be updated using the slot's contents without losing any changes.

So you do it the other way around: first you create a logical replication slot, then you start a REPEATABLE READ transaction and set its snapshot, so that it sees the correct data.