Short version, I have a SQL statement where I only want the results if the number of rows returned is less than some value (say 1000) and otherwise I want a different result set. What's the best way to do this without incurring the overhead of returning the 1000 rows (as would happen if I used limit) when I'm just going to throw them away?
For instance, I want to return the results of
SELECT *
FROM T
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name ASC
provided there are at most 1000 entries but if there are more than that I want to return
SELECT *
FROM T
ORDER BY name ASC
LIMIT 25
Two queries isn't bad, but I definitely don't want to get 1000 records back from the first query only to toss them.
(Happy to use Postgres extensions too but prefer SQL)
--
To explain I'm refreshing data requested by client in batches and sometimes the client needs to know if there have been any changes in the part they've already received. If there are too many changes, however, I'm just giving up and starting to send the records from the start again.
CodePudding user response:
WITH max1000 AS (
SELECT the_row, count(*) OVER () AS total
FROM (
SELECT the_row -- named row type
FROM T AS the_row
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name
LIMIT 1001
) sub
)
SELECT (the_row).* -- parentheses required
FROM max1000 m
WHERE total < 1001
UNION ALL
( -- parentheses required
SELECT *
FROM T
WHERE (SELECT total > 1000 FROM max1000 LIMIT 1)
ORDER BY name
LIMIT 25
)
The subquery sub
in CTE max1000
gets the complete, sorted result for the first query - wrapped as row type, and with LIMIT 1001
to avoid excess work.
The outer SELECT
adds the total row count. See:
The first SELECT
of the outer UNION
query returns decomposed rows as result - if there are less than 1001 of them.
The second SELECT
of the outer UNION
query returns the alternate result - if there were more than 1000. Parentheses are required - see:
Or:
WITH max1000 AS (
SELECT *
FROM T
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name
LIMIT 1001
)
, ct(ok) AS (SELECT count(*) < 1001 FROM max1000)
SELECT *
FROM max1000 m
WHERE (SELECT ok FROM ct)
UNION ALL
( -- parentheses required
SELECT *
FROM T
WHERE (SELECT NOT ok FROM ct)
ORDER BY name
LIMIT 25
);
I think I like the 2nd better. Not sure which is faster.
Either optimizes performance for less than 1001 rows in most calls. If that's the exception, I would first run a somewhat cheaper count. Also depends a lot on available indexes ...
You get no row if the first query finds no row. (Seems like an odd result.)