Home > Software engineering >  JpaItemWriter<T> stills performs writes one item at a time instead of in batch
JpaItemWriter<T> stills performs writes one item at a time instead of in batch

Time:08-20

I have a question about writing operations in Spring Batch on databases through the ItemWriter<T> contract. To quote from The Definitive Guide to Spring Batch by Michael T. Minella:

All of the items are passed in a single call to the ItemWriter where they can be written out at once. This single call to the ItemWriter allows for IO optimizations by batching the physical write. [...] Chunks are defined by their commit intervals. If the commit interval is set to 50 items, then your job reads in 50 items, processes 50 items, and then writes out 50 items at once.

Yet when I use, say, HibernateItemWriter or JpaItemWriter in a step-based job to write to the database in a Spring-Boot-based app with all the Spring Batch infrastructure in place (@EnableBatchProcessing, Step/JobBuilderFactory, etc.) together with monitoring tools for verifying the number of insert/update statements like implementations of the MethodInterceptor interface, I notice that the number of inserts performed by the writer is equal to the total size of records to process instead of the number of chunks set for that job.

For example, upon inspection of the logs in Intellij from a job execution of 10 items with a chunk size of 5, I found 10 insert statements

Query:["insert into my_table (fields...

instead of 2. I also checked for insert statements in the general_log_file for my RDS instance and found two 'Prepare insert' statements and one 'Execute insert' statement for each item to process.

Now I understand that a writer such as JpaItemWriter<T>'s method write(List<? extends T> items) loops through the items calling entityManager.persist/merge(item) - thereby inserting a new row into the corresponding table - and eventually entityManager.flush(). But where is the performance gain provided by the batch processing, if there is any?

CodePudding user response:

where is the performance gain provided by the batch processing, if there is any?

There is performance gain, and this gain is provided by the chunk-oriented processing model that Spring Batch offers in the sense that it will execute all these insert statements in a single transaction:

start transaction
INSERT INTO table ... VALUES ...
INSERT INTO table ... VALUES ...
...
INSERT INTO table ... VALUES ...
end transaction

You would see a performance hit if there was a transaction for each item, something like:

start transaction
INSERT INTO table ... VALUES ...
end transaction
start transaction
INSERT INTO table ... VALUES ...
end transaction
...

But that is not the case with Spring Batch, unless you set the chunk-size to 1 (but that defeats the goal of using such a processing model in the first place).

So yes, even if you see multiple insert statements, that does not mean that there are no batch inserts. Check the transaction boundaries in your DB logs and you should see a transaction around each chunk, not around each item.


As a side note, from my experience, using raw JDBC performs better than JPA (with any provider) when dealing with large inserts/updates.

  • Related