Postgres inconsistent insertion performance-CodePudding

I use JDBC to make inserts into postgres. For some reason performance seems to vary between runs. I observed it switching between 2 performance levels, "fast" and about 10x slower.

I've tested the issue with postgres 14.2, 14.3 and 13.6. It seems it is relevant for all the versions.

What could possibly be the reason for this behavior?

MRE (java 17):

package test;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.Connection;
import java.sql.DriverManager;

public class App {
  private static final Logger log = LoggerFactory.getLogger(App.class);

  public static void main(String[] args) throws Exception {
    Class.forName("org.postgresql.Driver");
    try (Connection connection = DriverManager.getConnection("jdbc:postgresql://172.17.0.3:5432/postgres", "postgres", "postgres")) {
      connection.setAutoCommit(false);
      var uploadedCount = 0;
      final var insertA = connection.prepareStatement("insert into A(AA) values(?)");
      final var insertB = connection.prepareStatement("insert into B(BA, BB) values(?, ?)");
      final var startTime = System.currentTimeMillis();
      var timeA = 0;
      var timeB = 0;
      var batchSize = 0;
      var batchesCount = 0;
      for (long i = 0; i < 16_384; i  ) {
        insertA.setObject(1, i);
        insertA.addBatch();
        insertB.setObject(1, i);
        insertB.setObject(2, i);
        insertB.addBatch();
        batchSize  ;
        if (batchSize == 32) {
          var start = System.currentTimeMillis();
          insertA.executeBatch();
          timeA  = System.currentTimeMillis() - start;
          start = System.currentTimeMillis();
          insertB.executeBatch();
          timeB  = System.currentTimeMillis() - start;
          uploadedCount  = batchSize * 2;
          batchSize = 0;
          batchesCount  ;
          if (batchesCount % 256 == 0) {
            log.info("INSERT A CUMULATIVE TIME: {}", timeA);
            log.info("INSERT B CUMULATIVE TIME: {}", timeB);
            log.info("ROWS PER SEC: {}", uploadedCount * 1000L / (System.currentTimeMillis() - startTime));
          }
        }
      }
    }
  }
}

MRE schema:

CREATE TABLE A (AA BIGINT NOT NULL, CONSTRAINT PK_AA PRIMARY KEY (AA));

CREATE TABLE B (BA BIGINT NOT NULL, BB BIGINT NOT NULL, CONSTRAINT PK_BA_BB PRIMARY KEY (BA, BB));
ALTER TABLE B ADD CONSTRAINT FK_BA_A FOREIGN KEY (BA) REFERENCES A (AA);
ALTER TABLE B ADD CONSTRAINT FK_BB_A FOREIGN KEY (BB) REFERENCES A (AA);

MRE slow run example logs:

2022-05-31 14:50:45 I main App INSERT A CUMULATIVE TIME: 108
2022-05-31 14:50:45 I main App INSERT B CUMULATIVE TIME: 4395
2022-05-31 14:50:45 I main App ROWS PER SEC: 3623
2022-05-31 14:50:58 I main App INSERT A CUMULATIVE TIME: 197
2022-05-31 14:50:58 I main App INSERT B CUMULATIVE TIME: 17046
2022-05-31 14:50:58 I main App ROWS PER SEC: 1896

MRE fast run example logs:

2022-05-31 14:51:02 I main App INSERT A CUMULATIVE TIME: 100
2022-05-31 14:51:02 I main App INSERT B CUMULATIVE TIME: 269
2022-05-31 14:51:02 I main App ROWS PER SEC: 43115
2022-05-31 14:51:03 I main App INSERT A CUMULATIVE TIME: 177
2022-05-31 14:51:03 I main App INSERT B CUMULATIVE TIME: 509
2022-05-31 14:51:03 I main App ROWS PER SEC: 46811

I have not obeserved the phenomenon for schema:

CREATE TABLE A (AA BIGINT NOT NULL, CONSTRAINT PK_AA PRIMARY KEY (AA));

CREATE TABLE B (BA BIGINT NOT NULL, CONSTRAINT PK_BA PRIMARY KEY (BA));
ALTER TABLE B ADD CONSTRAINT FK_BA_A FOREIGN KEY (BA) REFERENCES A (AA);

CodePudding user response：

Your program never commits anything, so each time you run it the tables start out empty. Whether they have no rows at all, or simply have no committed rows but lots of in-progress/aborted ones, and whether the planner knows about this, will depend on when vacuum or analyze was last run.

If the table "a" starts out truly empty, the planner will think the most efficient way to verify the constraint is to do seq scan on "a", rather than an index scan. This starts out fine, but gets slower and slower as it has do dig through more and more of its own cruft and so gets progressively slower.

I can kick it into the fast mode by just doing manually ANALYZE a,b in a separate session while the first one is still running (although whether this works might depend on the version--I think in older versions one session's ANALYZE would ignore another sessions uncommitted rows).