Let's say I have this (Spring boot) code:
List<User> userList = userService.selectAll(); <-- this returns 1,000,000 rows
customerService.saveBulk(userList).get();
I want to split the list into small sizes and perform saveBulk
iteratively.
Is there any way to perform saveBulk smartly using java stream?
saveBulk is annotated with @Async
CodePudding user response:
You can use Collectors.groupingBy()
for this and then iterate over it.
final List<User> users = userService.selectAll();
final int partitionSize = 10000;
final AtomicInteger counter = new AtomicInteger();
users
.stream()
.collect(Collectors.groupingBy(x -> counter.getAndIncrement() / partitionSize))
.values()
// now you have a Collection<List<User>>
// each list contains partitionSize elments
.stream()
.map(group -> customerService.saveBulk(group));
.forEach(future -> future.get())
My Java is a little rusty so you might want to return a single combined future if you want to make your method @Async
also and return some Future<T>
instead of calling .get()
on the futures iteratively.
Depending on the number of your users it might be better to individually load chunks of users, perform your needed actions and save them again. In this solution all users are loaded into memory by userService.selectAll()
at the beginning. Depending on the number of your users this might be too much data (1.000.000 rows) loaded at the same time.
I think the best approach would be to paginate userService.selectAll()
and query 10.000 users, do you work, and save the 10.000 users back. For this you would need some kind of pagination of your data.
If your Backend is a database with a Hibernate ORM you can make it kind of like this:
- Make your backing
userRepository
aPagingAndSortingRepository
- Iterate over the batch sized pages to operate in batches
Then the logic would be something like:
UserRepository:
@Repository
public interface UserRepository extends PagingAndSortingRepository<User, Long> {
}
UserService:
public Page<User> findPaginated(int pageNo, int pageSize) {
Pageable paging = PageRequest.of(pageNo, pageSize);
Page<Country> pagedResult = userRepository.findAll(paging);
return pagedResult;
}
Your Logic:
int pageSize = 10000
int totalPages = userService
.findPaginated(0,pageSize).getTotalPages()
// Then you can iterate over all the pages however you like
for(int i = 0; i < totalPages; i ) {
List<User> batch = userService.findPaginated(i, pageSize)
// do your stuff with the batch
customerService.saveBulk(batch)
}