Home > Blockchain >  SQL limit with spread
SQL limit with spread

Time:12-18

Is there a way to limit a selection like with LIMIT, but instead of returning the limitation with an offset, limit with a "spread".

So for instance if a select returns 1000 rows and I limit it to 100, then I get every 10th row from start to end.

I know this would require the full SELECT to be performed since the RDBMS would need to go through all rows to do so. But instead of, for instance, returning 100000 rows when I need every 100th row, there would be a lot less transfer and the work could be done at the RDBMS.

I am requiring this on a PostgreSQL DB.

CodePudding user response:

There is no built-in syntax to do that in connection with LIMIT / OFFSET (nor with standard-SQL FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }]).

You can achieve your goal with the modulo operator %:

SELECT *
FROM (
   SELECT row_number() OVER () AS rn, ... original SELECT list
   FROM ... -- original query
   ) sub
WHERE rn = 0  -- every 10th row

Since there is no ORDER BY in the window definition, row numbers are assigned according to the ORDER BY of the query.
If there is no ORDER BY at all, you get an arbitrary order of rows. That's still some kind of order, the result is just out of your hands.

You can apply that kind of filter on an individual table with the TABLESAMPLE syntax.

SELECT * FROM tbl TABLESAMPLE SYSTEM (10);  -- roughly 10 %

Or:

SELECT * FROM tbl TABLESAMPLE BERNOULLI (10);  -- roughly 10 %

SYSTEM is faster, BERNOULLI is more random.

You could even apply the TABLESAMPLE filter on multiple tables in the same query, like:

SELECT *
FROM tbl1 TABLESAMPLE SYSTEM (10)
JOIN tbl2 TABLESAMPLE BERNOULLI (10) USING (big_id);

But the number of resulting rows can vary wildly. To get a given number of rows, consider the additional module tsm_system_rows instead. See:

  • Related