Home > front end >  Spring JPA Paging to Stream
Spring JPA Paging to Stream

Time:11-19

We are currently using cockroach DB, but this doesn't support returning X records at a time presumably due to lack of cursor support. This means that when trying to stream a large number (~10 million) of records a full ResultSet is returned by the DB, causing the app to fall over due to running out of memory.

Cockroach recommends using Pagination (ideally keyset) for retrieving large numbers of results, but is there a nice way of reading all pages and returning a Stream, without loading all results into memory at any point?

Thanks!

CodePudding user response:

As long as you don't have multiple operations happening at the same time on the same connection, you should be able to set the JDBC fetchSize property and control how many results are returned.

CodePudding user response:

As vendor documentation proposes...

The general pattern for keyset pagination queries is:

SELECT * FROM t AS OF SYSTEM TIME ${time}
WHERE key > ${value}
ORDER BY key
LIMIT ${amount}

This is faster than using LIMIT/OFFSET because, instead of doing a full table scan up to the value of the OFFSET, a keyset pagination query looks at a fixed-size set of records for each iteration. This can be done quickly provided that the key used in the WHERE clause to implement the pagination is indexed and unique. A primary key meets both of these criteria.

Note: CockroachDB does not have cursors. To support a cursor-like use case, namely "operate on a snapshot of the database at the moment the cursor is opened", use the AS OF SYSTEM TIME clause as shown in the examples below.

It means or the user (interface) provides or "we" store:

  • $value, which refers to the "last seen key" per pagination request(/session!)

additionlly, if we want that "cursor like" behavior, we need to provide/store:

  • ${time}, which refers to the last request(/session) pagination timestamp.

When we provide the additional parameters, we can do it all in one repository query:

public interface SomeRepository extends JpaRepository<Some, Long> {

    @Query(
      value = "SELECT * FROM SOME [, ....]        "  
              "AS OF SYSTEM TIME follower_read_timestamp()"  // -> [5]
              "[JOIN ...]                                 " 
              "WHERE [... AND] s.ID > :lastKey",
      // If you want/need the count
      countQuery = "SELECT count(*) FROM <[SAME_QUERY]>",
      nativeQuery = true)
      // Pageable if you  know the total size, Slice otherwise ;)
      Pageable<Some> findCustom(/* more paramas?, */
        @Param("lastKey") Long lastKey, Pageable pageable);
      // done(?;)
}

... and maintain discreet hope, that only pageable.pageSize items will be fetched (from db & loaded to memory).

Some Refs:

  • Related