I'm using PostgreSQL and I have a table
with the following columns: id
, distance
, and length
. I want to order the table by distance
and create a new column called cum_length
using a window function. I also want to filter the rows so that only rows until cum_length
value cross a certain threshold are included in the final result.
Example of input table
:
id | distance | length |
---|---|---|
1 | 10 | 1 |
2 | 5 | 2 |
3 | 8 | 1 |
4 | 1 | 3 |
5 | 3 | 2 |
6 | 9 | 2 |
Desired output for a threshold of 6
:
id | distance | length | cum_length |
---|---|---|---|
4 | 1 | 3 | 3 |
5 | 3 | 2 | 5 |
2 | 5 | 2 | 7 |
This is the SQL that I came up with:
WITH ordered_table AS (
SELECT id,
distance,
length,
SUM(length) OVER (ORDER BY distance) AS cum_length
FROM table)
SELECT *
FROM ordered_table
WHERE cum_length <= 6
But this omits the last row of the desired result.
CodePudding user response:
Here is it based on your ordered_table
query and an extra lag
window function to calculate previous_is_less
than the threshold value. Records where previous_is_less
is true or null qualify for selection.
with t as
(
select *,
lag(cum_length) over (order by cum_length) < 6 as previous_is_less
from
(
SELECT id, distance, length,
SUM(length) OVER (order BY distance) AS cum_length
from the_table
) as ordered_table
)
select id, distance, length, cum_length
from t
where coalesce(previous_is_less, true)
order by cum_length;
DB-Fiddle demo
CodePudding user response:
Try the following:
WITH ordered_table AS
(
SELECT id, distance, length,
SUM(length) OVER (ORDER BY distance) AS cum_length
FROM table_name
)
SELECT id, distance, length, cum_length
FROM ordered_table
WHERE cum_length <= COALESCE((SELECT MIN(cum_length) FROM ordered_table WHERE cum_length > 6), 6)
For your sample data, this is equivalent to WHERE cum_length <= 7
.
CodePudding user response:
May be interesting to compare with this one on large data set, only one sorting by distance and should stop scanning data as soon as the threshold is found
with data(id, distance, length) as (
select 1, 10, 1 FROM DUAL UNION ALL
select 2, 5, 2 FROM DUAL UNION ALL
select 3, 8, 1 FROM DUAL UNION ALL
select 4, 1, 3 FROM DUAL UNION ALL
select 5, 3, 2 FROM DUAL UNION ALL
select 6, 9, 2 FROM DUAL -- UNION ALL
),
rdata(id, distance, length, rn) as (
select id, distance, length, row_number() over(order by distance) as rn
from data
),
recdata(id, distance, length, rn, cumlength) as (
select id, distance, length, rn, length
from rdata d
where rn = 1
union all
select d.id, d.distance, d.length, d.rn, d.length r.cumlength
from recdata r
join rdata d on d.rn = r.rn 1 and r.cumlength <= 6
)
select id, distance, length, cumlength from recdata
;