Home > Net >  How to properly / efficiently manage entity manager JPA Spring @Transactional for large datasets?
How to properly / efficiently manage entity manager JPA Spring @Transactional for large datasets?

Time:01-18

I am attempting to insert ~57,000 entities in my database, but the insert method takes longer and longer as the loop progresses. I have implemented batches of 25 - each time flushing, clearing, and closing the transaction (I'm pretty sure) without success. Is there something else I need to be doing in the code below to maintain the insert rate? I feel like it should not take 4 hours to insert 57K records.

[Migrate.java]

This is the main class that loops through 'Xaction' entities and adds 'XactionParticipant' records based off each Xaction.

// Use hibernate cursor to efficiently loop through all xaction entities
String hql = "select xaction from Xaction xaction";
Query<Xaction> query = session.createQuery(hql, Xaction.class);
query.setFetchSize(100);
query.setReadOnly(true);
query.setLockMode("xaction", LockMode.NONE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);

int count = 0;
Instant lap = Instant.now();
List<Xaction> xactionsBatch = new ArrayList<>();
while (results.next()) {
    count  ;

    Xaction xaction = (Xaction) results.get(0);
    xactionsBatch.add(xaction);

    // save new XactionParticipants in batches of 25
    if (count % 25 == 0) {
        xactionParticipantService.commitBatch(xactionsBatch);
        float rate = ChronoUnit.MILLIS.between(lap, Instant.now()) / 25f / 1000;
        System.out.printf("Batch rate: %.4fs per xaction\n", rate);
        xactionsBatch = new ArrayList<>();
        lap = Instant.now();
    }
}
xactionParticipantService.commitBatch(xactionsBatch);
results.close();

[XactionParticipantService.java]

This service provides a method with "REQUIRES_NEW" in an attempt to close the transaction for each batch

@Transactional(propagation = Propagation.REQUIRES_NEW)
public void commitBatch(List<Xaction> xactionBatch) {
    for (Xaction xaction : xactionBatch) {
        try {
            XactionParticipant xp = new XactionParticipant();
            // ... create xp based off Xaction info ...

            // Use native query for efficiency
            String nativeQueryStr = "INSERT INTO XactionParticipant .... xp info/data";
            Query q = em.createNativeQuery(nativeQueryStr);
            q.executeUpdate();
        } catch (Exception e) {
            log.error("Unable to update", e);
        }
    }
    // Clear just in case??
    em.flush();
    em.clear();
}

CodePudding user response:

That is not clear what is the root cause of your performance problem: java memory consumption or db performance, please check some thoughts below:

  1. The following code does not actually optimize memory consumption:
String hql = "select xaction from Xaction xaction";
Query<Xaction> query = session.createQuery(hql, Xaction.class);
query.setFetchSize(100);
query.setReadOnly(true);
query.setLockMode("xaction", LockMode.NONE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);

Since you are retrieving full-functional entities, those entities get stored in persistence context (session-level cache), and in order to free memory up you need to detach entity upon entity has been processed (i.e. after xactionsBatch.add(xaction) or // ... create xp based off Xaction info ...), otherwise at the end of processing you consume the same amount of memory as you were doing List<> results = query.getResultList();, and here I'm not sure what is better: consume all memory required at the start of transaction and release all other resources or keep cursor and jdbc connection open for 4 hours.

  1. The following code does not actually optimize JDBC interactions:
    for (Xaction xaction : xactionBatch) {
        try {
            XactionParticipant xp = new XactionParticipant();
            // ... create xp based off Xaction info ...

            // Use native query for efficiency
            String nativeQueryStr = "INSERT INTO XactionParticipant .... xp info/data";
            Query q = em.createNativeQuery(nativeQueryStr);
            q.executeUpdate();
        } catch (Exception e) {
            log.error("Unable to update", e);
        }
    }

yes, in general, JDBC should be faster than JPA API, however that is not your case - you are inserting records one-by-one instead of using batch inserts. In order to take advantage of batches your code should look like:

@Transactional(propagation = Propagation.REQUIRES_NEW)
public void commitBatch(List<Xaction> xactionBatch) {
    session.doWork(connection -> {
        String insert = "INSERT INTO XactionParticipant VALUES (?, ?, ...)";
        try (PreparedStatement ps = connection.prepareStatement(insert)) {
            for (Xaction xaction : xactionBatch) {
                ps.setString(1, "val1");
                ps.setString(2, "val2");
                ps.addBatch();
                ps.clearParameters();
            }
            ps.executeBatch();
        }
    });
}

BTW, Hibernate may do the same if hibernate.jdbc.batch_size is set to large enough positive integer and entities are properly designed (id generation is backed up by DB sequence and allocationSize is large enough)

  • Related