Home > Net >  Java WebClient - collects all objects from a paginated website
Java WebClient - collects all objects from a paginated website

Time:03-18

I want to iterate through all pages of a given url and collect JSON objects. With this code I'm getting list of 10 objects:

List<EzamowieniaDto> ezam = WebClient
            .create("https://ezamowienia.gov.pl/mo-board/api/v1/Board/Search?noticeType=ContractNotice&isTenderAmountBelowEU=true"  
                    "&publicationDateFrom=2022-03-16T00:00:00.000Z&orderType=Delivery&SortingColumnName=PublicationDate&SortingDirection=DESC"  
                    "&PageNumber=1")
            .get()
            .retrieve()
            .bodyToMono(new ParameterizedTypeReference<List<EzamowieniaDto>>(){})
            .block();

I've tried to just delete "PageNumber" from request, but it seems the pagination is hard-coded for this page.

(X-Pagination header from response: [{"TotalCount":88,"PageSize":10,"CurrentPage":1,"TotalPages":9,"HasNext":true,"HasPrevious":false}])

The question is: How can I iterate through number of pages mentioned in response header, and collect the whole data?

CodePudding user response:

Firstly, do not use .block() method, because, shortly, it interrupts asynchronous stream and makes it synchronous, so there is no need, actually, in WebClient nowadays (here you can find some brief intro), to use it in such way. You can use, also, RestTemplate implementations like Retrofit. But in your case, to save the asynchronous pattern, you can use next solution:

List<EzamowieniaDto> ezam = WebClient
            .create("https://ezamowienia.gov.pl/mo-board/api/v1/Board/Search?noticeType=ContractNotice&isTenderAmountBelowEU=true"  
                    "&publicationDateFrom=2022-03-16T00:00:00.000Z&orderType=Delivery&SortingColumnName=PublicationDate&SortingDirection=DESC"  
                    "&PageNumber=1")
            .get()
            .retrieve()
            .bodyToFlux(EzamowieniaDto.class) // here you can just use Flux, it's like List from synchronous Java, simply
            .map({body -> /*body = EzamowieniaDto , do any job with this object here*/})
            ...

Example

...

List<EzamowieniaDto> dtos = new ArrayList<>();

Flux<EzamowieniaDto> fluxDtos = WebClient
            .create("http://some-url.com")
            .get()
            .retrieve()
            .bodyToFlux(EzamowieniaDto.class)
            .filter({body -> body.getId().equals(1L)}) // here just some filter for emitted elements
            .subscribe({dto -> dtos.add(dto)}); // subscribe to end asynchronous flow , in simple words

System.out.println(dtos.get(0).getId().equals(1L); // some simple test or something like this, use dtos as you need.
            

Additionally

Using synchronous staff (Lists, Mono of List, etc.) mixed with asynchronous, you will always get synchronous behavior at some point of time, in the place in your code where it happens. Reactive programming implies that you use asynchronous programming (mostly, declarative programming) while the whole process from fetching asynchronously response to asynchronously writing to the database (Hibernate Reactive, for example).

Hope it helps somehow and I suggest to start learning reactive programming (Reactor or Spring WebFlux, for example), if you are not started yet to understand basics of asynchronous programming.

Best Regards, Anton.

CodePudding user response:

Here is the way you could handle paginaged requests with WebClient.

  1. Create a method to retreive a single page of data. Typically you would use bodyToFlux(EzamowieniaDto.class) and return Flux<EzamowieniaDto> but because we need headers we have to use toEntityFlux(EzamowieniaDto.class) to wrap response in Mono<ResponseEntity.
Mono<ResponseEntity<Flux<EzamowieniaDto>>> getPage(String url, int pageNumber) {
    return webClient.get()
            .uri(url   "&PageNumber={pageNum}", pageNumber)
            .retrieve()
            .toEntityFlux(EzamowieniaDto.class);
}
  1. Use expand to to fetch data until we reach the end
Flux<EzamowieniaDto> getData(String url) {
    return getPage(url, 1)
            .expand(response -> {
                Pagination pagination = formJson(response.getHeaders().getFirst("X-Pagination"));
                if (!pagination.hasNext()) {
                    // stop
                    return Mono.empty(); 
                }

                // fetch next page
                getPage(url, pagination.getCurrentPage()   1);
            })
            .flatMap(response -> response.getBody());
}
  • Related