I would like to know, if this way is recommended to implement the reader spring batch with jpa or is it better to look for another solution and if this way is not recommended where can I look for information on a better option
public class CreditCardItemReader implements ItemReader<CreditCard> {
@Autowired
private CreditCardRepository respository;
private Iterator<CreditCard> usersIterator;
@BeforeStep
public void before(StepExecution stepExecution) {
usersIterator = respository.someQuery().iterator();
}
@Override
public CreditCard read() {
if (usersIterator != null && usersIterator.hasNext()) {
return usersIterator.next();
} else {
return null;
}
}
}
CodePudding user response:
This implementation is acceptable only for the small dataset because data is read by one batch query, and stored whole result list in memory. Also, it is not thread-safe.
In the case of loading large volumes:
- on the environment with limited memory can lead to out of memory
- can lead to performance problems. We will wait until thousands of records will be loaded from DB by one call
Solution 1, org.springframework.batch.item.database.JpaCursorItemReader
A similar implementation is defined out of the box in Spring Batch: JpaCursorItemReader
The main difference is that this implementation is working only with specific JPQL query instead of repository and use JPA’s Query.getResultStream() method to get query results.
Implementation of JpaCursorItemReader
:
protected void doOpen() throws Exception {
...
Query query = createQuery();
if (this.parameterValues != null) {
this.parameterValues.forEach(query::setParameter);
}
this.iterator = query.getResultStream().iterator();
}
Hibernate, for example, introduced the Query.getResultStream()
method in version 5.2.
It uses Hibernate’s ScrollableResult
implementation to move through the result set and to fetch the records in batches. That prevents you from loading all records of the result set at once and allows you to process them more efficiently.
Example of creation:
protected ItemReader<Foo> getItemReader() throws Exception {
LocalContainerEntityManagerFactoryBean factoryBean = new LocalContainerEntityManagerFactoryBean();
String jpqlQuery = "from Foo";
JpaCursorItemReader<Foo> itemReader = new JpaCursorItemReader<>();
itemReader.setQueryString(jpqlQuery);
itemReader.setEntityManagerFactory(factoryBean.getObject());
itemReader.afterPropertiesSet();
itemReader.setSaveState(true);
return itemReader;
}
Solution 2, org.springframework.batch.item.database.JpaPagingItemReader
It is more flexible solution for JPQL query than JpaCursorItemReader
. ItemReader loads and stores data by pages and it is thread-safe.
According to documentation:
ItemReader for reading database records built on top of JPA.
It executes the JPQL setQueryString(String) to retrieve requested data. The query is executed using paged requests of a size specified in AbstractPagingItemReader.setPageSize(int). Additional pages are requested when needed as AbstractItemCountingItemStreamItemReader.read() method is called, returning an object corresponding to current position.
The performance of the paging depends on the JPA implementation and its use of database specific features to limit the number of returned rows.
Setting a fairly large page size and using a commit interval that matches the page size should provide better performance.
In order to reduce the memory usage for large results the persistence context is flushed and cleared after each page is read. This causes any entities read to be detached. If you make changes to the entities and want the changes persisted then you must explicitly merge the entities.
The implementation is thread-safe in between calls
Solution 3, org.springframework.batch.item.data.RepositoryItemReader
It is a more efficient solution. It works with the repository, loads and stores data in chunks and it is thread-safe.
According to documentation:
A ItemReader that reads records utilizing a PagingAndSortingRepository.
Performance of the reader is dependent on the repository implementation, however setting a reasonably large page size and matching that to the commit interval should yield better performance.
The reader must be configured with a PagingAndSortingRepository, a Sort, and a pageSize greater than 0.
This implementation is thread-safe between calls to AbstractItemCountingItemStreamItemReader.open(ExecutionContext), but remember to use saveState=false if used in a multi-threaded client (no restart available).
Example of creation:
PagingAndSortingRepository<Foo, Long> repository = FooRepository<>();
RepositoryItemReader<Foo> reader = new RepositoryItemReader<>();
reader.setRepository(repository ); //The PagingAndSortingRepository implementation used to read input from.
reader.setMethodName("findByName"); //Specifies what method on the repository to call.
reader.setArguments(arguments); // Arguments to be passed to the data providing method.
Creation via builder:
PagingAndSortingRepository<Foo, Long> repository = new FooRepository<>();
new RepositoryItemReaderBuilder<>().repository(repository)
.methodName("findByName")
.arguments(new ArrayList<>())
.build()
More examples of usage: RepositoryItemReaderTests and RepositoryItemReaderIntegrationTests
Summarise:
Your implementation is good only for simple use cases.
I recommend to use out of box solutions.