Spring Batch - use JpaPagingItemReader to read lists instead of individual items-CodePudding

Spring Batch is designed to read and process one item at a time, then write the list of all items processed in a chunk. I want my item to be a List<T> as well, to be thus read and processed, and then write a List<List<T>>. My data source is a standard Spring JpaRepository<T, ID>.

My question is whether there are some standard solutions for this "aggregated" approach. I see that there are some, but they don't read from a JpaRepository, like:

Update:

I'm looking for a solution that would work for a rapidly changing dataset and in a multithreading environment.

CodePudding user response：

I want my item to be a List as well, to be thus read and processed, and then write a List<List>.

Spring Batch does not (and should not) be aware of what an "item" is. It is up to you do design what an "item" is and how it is implemented (a single value, a list, a stream , etc). In your case, you can encapsulate the List<T> in a type that could be used as an item, and process data as needed. You would need a custom item reader though.

CodePudding user response：

The solution we found is to use a custom aggregate reader as suggested here, which accumulates the read data into a list of a given size then passes it along. For our specific use case, we read data using a JpaPagingItemReader. The relevant part is:

    public List<T> read() throws Exception {
        ResultHolder holder = new ResultHolder();

        // read until no more results available or aggregated size is reached
        while (!itemReaderExhausted && holder.getResults().size() < aggregationSize) {
            process(itemReader.read(), holder);
        }

        if (CollectionUtils.isEmpty(holder.getResults())) {
            return null;
        }

        return holder.getResults();
    }

    private void process(T readValue, ResultHolder resultHolder) {
        if (readValue == null) {
            itemReaderExhausted = true;
            return;
        }
        resultHolder.addResult(readValue);
    }

In order to account for the volatility of the dataset, we extended the JPA reader and overwritten the getPage() method to always return 0, and controlled the dataset through the processor and writer to have the next fresh data to be fetched always on the first page. The hint was given here and in some other SO answers.

public int getPage() {
    return 0;
}