Home > OS >  How to filter an ordered table until the first row where a threshold is reached?
How to filter an ordered table until the first row where a threshold is reached?

Time:01-10

I'm using PostgreSQL and I have a table with the following columns: id, distance, and length. I want to order the table by distance and create a new column called cum_length using a window function. I also want to filter the rows so that only rows until cum_length value cross a certain threshold are included in the final result.

Example of input table:

id distance length
1 10 1
2 5 2
3 8 1
4 1 3
5 3 2
6 9 2

Desired output for a threshold of 6:

id distance length cum_length
4 1 3 3
5 3 2 5
2 5 2 7

This is the SQL that I came up with:

WITH ordered_table AS (
  SELECT id,
    distance,
    length,
    SUM(length) OVER (ORDER BY distance) AS cum_length
  FROM table)
SELECT *
FROM ordered_table
WHERE cum_length <= 6

But this omits the last row of the desired result.

CodePudding user response:

Here is it based on your ordered_table query and an extra lag window function to calculate previous_is_less than the threshold value. Records where previous_is_less is true or null qualify for selection.

with t as 
(
 select *, 
    lag(cum_length) over (order by cum_length) < 6 as previous_is_less
 from  
 (
  SELECT id, distance, length,
     SUM(length) OVER (order BY distance) AS cum_length
  from the_table
 ) as ordered_table
)
select id, distance, length, cum_length
from t 
where coalesce(previous_is_less, true)
order by cum_length;

DB-Fiddle demo

CodePudding user response:

Try the following:

WITH ordered_table AS 
(
  SELECT id, distance, length,
    SUM(length) OVER (ORDER BY distance) AS cum_length
  FROM table_name
)
SELECT id, distance, length, cum_length
FROM ordered_table
WHERE cum_length <= COALESCE((SELECT MIN(cum_length) FROM ordered_table WHERE cum_length > 6), 6)

For your sample data, this is equivalent to WHERE cum_length <= 7.

See demo

CodePudding user response:

May be interesting to compare with this one on large data set, only one sorting by distance and should stop scanning data as soon as the threshold is found

with data(id, distance, length) as (
    select 1, 10, 1 FROM DUAL UNION ALL
    select 2, 5, 2 FROM DUAL UNION ALL
    select 3, 8, 1 FROM DUAL UNION ALL
    select 4, 1, 3 FROM DUAL UNION ALL
    select 5, 3, 2 FROM DUAL UNION ALL
    select 6, 9, 2 FROM DUAL -- UNION ALL
),
rdata(id, distance, length, rn) as (
    select id, distance, length, row_number() over(order by distance) as rn
    from data 
),
recdata(id, distance, length, rn, cumlength) as ( 
    select id, distance, length, rn, length
    from rdata d
    where rn = 1
    
    union all
    
    select d.id, d.distance, d.length, d.rn, d.length   r.cumlength
    from recdata r
    join rdata d on d.rn = r.rn   1 and r.cumlength <= 6 
)
select id, distance, length, cumlength from recdata
;
  • Related