Spring Boot JPA/JDBC batching findById works but findOneByX not working-CodePudding

I am using Spring Boot JPA, I have enabled batching by ensuring the following lines are in the my application.properties:

spring.jpa.properties.hibernate.jdbc.batch_size=1000
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true

I now have a loop where I am doing a findById on an entity and then saving that entity like so:

var entity = dao.findById(id)
// Do some stuff
dao.save(entity) //This line is not really required but I am being explicit here

Putting the above in a loop I see that the save(update) statements are batched to the DB. My issue is that if I do a findOneByX where X is a property on the entity then the batching does not work (batch size of 1), requests get sent one at a time i.e.:

var entity = dao.findOneByX(x)
// Do some stuff
dao.save(entity)

Why is this happening? Is JPA/JDBC only equipped to batch when we findById only?

CodePudding user response：

Solution

Refer to How to implement batch update using Spring Data Jpa?

Fetch the list of entity you want to update to a list
Update as desired
Call saveAll

PS: beware of memory usage for this solution, when your list size is large.

Why `findById` and `findOneByX` behave differently?

As suggested by M. Deinum, hibernate will auto flush your change

prior to executing a JPQL/HQL query that overlaps with the queued entity actions

Since both findById and findOneByX will execute query, what is the different between them?

First, the reason to flush is to make sure session and Database are in same state, hence you can get consistent result from session cache(if available) and database.

When calling findById, hibernate will try to get it from session cache, if entity is not available, fetch it from database. While for findOneByX, we always need to fetch it from database as it is impossible to cache entity by X.

Then we can consider below example:

@Entity
@Getter
@Setter
@NoArgsConstructor
@AllArgsConstructor
public class Student {
    @Id
    private Long id;
    private String name;
    private int age;
}

Suppose we have | id | name | age | | :- | :--- | :-- | | 1 | Amy | 10 |

@Transactional
public void findByIdAndUpdate() {
    dao.save(new Student(2L, "Dennis", 14));
    // no need to flush as we can get from session
    for (int i = 0; i < 100; i  ) {
        Student dennis = dao.findById(2L).orElseThrow();
        dennis.setAge(i);
        dao.save(dennis);
    }
}

Will result in

412041 nanoseconds spent executing 2 JDBC batches;

1 for insert 1 one for update.

Hibernate: I'm sure that result can be fetch from session (without flush) or database if record is not in session, so let's skip flushing as it is slow!

@Transactional
public void findOneByNameAndUpdate() {
    Student amy = dao.findOneByName("Amy");
    // this affect later query
    amy.setName("Tammy");
    dao.save(amy);
    for (int i = 0; i < 100; i  ) {
        // do you expect getting result here?
        Student tammy = dao.findOneByName("Tammy");
        // Hibernate not smart enough to notice this will not affect later result.
        tammy.setAge(i);
        dao.save(tammy);
    }
}

Will result in

13964088 nanoseconds spent executing 101 JDBC batches;

1 for first update and 100 for update in loop.

Hibernate: Hmm, I'm not sure if stored update will affect the result, better flush the update or I will be blamed by developer.

Solution

Why findById and findOneByX behave differently?

Why `findById` and `findOneByX` behave differently?