How to tell PostgreSQL via JDBC that I'm not going to fetch every row of the query result (i.e.-CodePudding

Quite frequently, I'd like to retrieve only the first N rows from a query, but I don't know in advance what N will be. For example:

try(var stream = sql.selectFrom(JOB)
                    .where(JOB.EXECUTE_AFTER.le(clock.instant()))
                    .orderBy(JOB.PRIORITY.asc())
                    .forUpdate()
                    .skipLocked()
                    .fetchSize(100)
                    .fetchLazy()) {
    // Now run as many jobs as we can in 10s
    ...
}

Now, without adding an arbitrary LIMIT clause, the PG query planner sometimes decides to do a painfully slow sequential table scan for such queries, afaik because it thinks I'm going to fetch every row in the result set. An arbitrary LIMIT kind of works for simple cases like this one, but I don't like it at all because:

a) the limit's only there to "trick" the query planner into doing the right thing, it's not there because my intent is to fetch at most N rows.
b) when it gets a little more sophisticated and you have multiple such queries that somehow depend on each other, choosing an N large enough to not break your code can be hard. You don't want to be the next person who has to understand that code.
c) finding out that the query is unexpectedly slow usually happens in production where your tables contain a few million/billion rows. Totally avoidable if only the DB didn't insist on being smarter than the developer.
d) I'm getting tired of writing detailed comments that explain why the queries have to look like this-n-that (i.e. explain why the query planner screws up if I don't add this arbitrary limit)

So, how do I tell the query planner that I'm only going to fetch a few rows and getting the first row quickly is the priority here? Can this be achieved using the JDBC API/driver?

(Note: I'm not looking for server configuration tweaks that indirectly influence the query planner, like tinkering with random page costs, nor can I accept a workaround like set seq_scan=off)

(Note 2: I'm using jOOQ in the example code for clarity, but under the hood this is just another PreparedStatement using ResultSet.TYPE_FORWARD_ONLY and ResultSet.CONCUR_READ_ONLY, so afaik we can rule out wrong statment modes)

((Note 3: I'm not in autocommit mode ;-)))

Thanks in advance for any hints, can't imagine I'm the first to have run into this :)

CodePudding user response：

PostgreSQL is smarter than you think. All you need to do is to set the fetch size of the java.sql.Statement to a value different from 0 using the setFetchSize method. Then the JDBC driver will create a cursor and fetch the result set in chunks. Any query planned that way will be optimized for fast fetching of the first 10% of the data (this is governed by the PostgreSQL parameter cursor_tuple_fraction). Even if the query performs a sequential scan of a table, not all the rows have to be read: reading will stop as soon as no more result rows are fetched.

I have no idea how to use JDBC methods with your ORM, but there should be a way.

CodePudding user response：

In case the JDBC fetchSize() method doesn't suffice as a hint to get the behaviour you want, you could make use of explicit server side cursors like this:

ctx.transaction(c -> {
    c.dsl().execute("declare c cursor for {0}", dsl
        .selectFrom(JOB)
        .where(JOB.EXECUTE_AFTER.le(clock.instant()))
        .orderBy(JOB.PRIORITY.asc())
        .forUpdate()
        .skipLocked()
    );

    try {
        JobRecord r;

        while ((r = dsl.resultQuery("fetch forward 1 from c")
                       .coerce(JOB)
                       .fetchOne()) != null) {
            System.out.println(r);
        }
    }
    finally {
        c.dsl().execute("close c");
    }
});

There's a pending feature request to support the above also in the DSL API (see #11407), but the above example shows that this can still be done in a type safe way using plain SQL templating and the ResultQuery::coerce method