Home > Back-end >  SQL: Give up/return different result if too many rows
SQL: Give up/return different result if too many rows

Time:05-23

Short version, I have a SQL statement where I only want the results if the number of rows returned is less than some value (say 1000) and otherwise I want a different result set. What's the best way to do this without incurring the overhead of returning the 1000 rows (as would happen if I used limit) when I'm just going to throw them away?

For instance, I want to return the results of

SELECT * 
FROM T 
WHERE updated_at > timestamp 
  AND name <= 'Michael' 
ORDER BY name ASC

provided there are at most 1000 entries but if there are more than that I want to return

SELECT * 
FROM T 
ORDER BY name ASC 
LIMIT 25

Two queries isn't bad, but I definitely don't want to get 1000 records back from the first query only to toss them.

(Happy to use Postgres extensions too but prefer SQL)

--

To explain I'm refreshing data requested by client in batches and sometimes the client needs to know if there have been any changes in the part they've already received. If there are too many changes, however, I'm just giving up and starting to send the records from the start again.

CodePudding user response:

WITH max1000 AS (
   SELECT the_row, count(*) OVER () AS total
   FROM  (
      SELECT the_row  -- named row type
      FROM   T AS the_row
      WHERE  updated_at > timestamp
      AND    name <= 'Michael'
      ORDER  BY name
      LIMIT  1001
      ) sub
   )
SELECT (the_row).*  -- parentheses required
FROM   max1000 m
WHERE  total < 1001

UNION ALL
(  -- parentheses required
SELECT * 
FROM   T 
WHERE (SELECT total > 1000 FROM max1000 LIMIT 1)
ORDER  BY name
LIMIT  25
)

The subquery sub in CTE max1000 gets the complete, sorted result for the first query - wrapped as row type, and with LIMIT 1001 to avoid excess work.

The outer SELECT adds the total row count. See:

The first SELECT of the outer UNION query returns decomposed rows as result - if there are less than 1001 of them.

The second SELECT of the outer UNION query returns the alternate result - if there were more than 1000. Parentheses are required - see:

Or:

WITH max1000 AS (
   SELECT *
   FROM   T
   WHERE  updated_at > timestamp
   AND    name <= 'Michael'
   ORDER  BY name
   LIMIT  1001
   )
, ct(ok) AS (SELECT count(*) < 1001 FROM max1000)   

SELECT *
FROM   max1000 m
WHERE (SELECT ok FROM ct)

UNION ALL
(  -- parentheses required
SELECT * 
FROM   T 
WHERE (SELECT NOT ok FROM ct)
ORDER  BY name
LIMIT  25
);

I think I like the 2nd better. Not sure which is faster.

Either optimizes performance for less than 1001 rows in most calls. If that's the exception, I would first run a somewhat cheaper count. Also depends a lot on available indexes ...

You get no row if the first query finds no row. (Seems like an odd result.)

  • Related