I need to calculate 5 working days from a given date, based on the table below.
5 working days from Jan 9 is Jan 16 because the sum of the working_days
column below between those dates is 5.
Here is SQL that I used.
WITH dates AS
(
SELECT t_from.start_date, t_to.start_date end_date
FROM #t t_from, #t t_to
WHERE t_from.start_date < t_to.start_date
),
sum_days AS
(
SELECT
start_date, end_date,
(SELECT SUM(t_sum.working_days)
FROM #t t_sum
WHERE t_sum.start_date BETWEEN d.start_date AND d.end_date) tot_days
FROM
dates d
)
SELECT
start_date, MAX(end_date) end_date
FROM
sum_days
WHERE
tot_days = 5
GROUP BY
start_date
It works, but it is inefficient. The real table that I'm using has 1,000 rows, and it takes over 1 minute for the query to return.
My question: is there a better way?
Input:
start_date | working_days |
---|---|
2023-01-09 | 1 |
2023-01-10 | 1 |
2023-01-11 | 1 |
2023-01-12 | 1 |
2023-01-13 | 1 |
2023-01-14 | 0 |
2023-01-15 | 0 |
2023-01-16 | 0 |
2023-01-17 | 1 |
2023-01-18 | 1 |
2023-01-19 | 1 |
2023-01-20 | 1 |
2023-01-21 | 0 |
2023-01-22 | 0 |
2023-01-23 | 1 |
2023-01-24 | 1 |
Desired output:
start_date | end_date |
---|---|
2023-01-09 | 2023-01-16 |
2023-01-10 | 2023-01-17 |
2023-01-11 | 2023-01-18 |
2023-01-12 | 2023-01-19 |
2023-01-13 | 2023-01-22 |
2023-01-14 | 2023-01-23 |
2023-01-15 | 2023-01-23 |
2023-01-16 | 2023-01-23 |
2023-01-17 | 2023-01-23 |
2023-01-18 | 2023-01-24 |
SQL to create the table:
drop table if exists #t;
GO
select '2023-01-09' start_date,1 working_days into #t;
GO
insert into #t values('2023-01-10',1) ;
go
insert into #t values('2023-01-11',1);
insert into #t values('2023-01-12',1);
insert into #t values('2023-01-13',1);
insert into #t values('2023-01-14',0);
insert into #t values('2023-01-15',0);
insert into #t values('2023-01-16',0);
insert into #t values('2023-01-17',1);
insert into #t values('2023-01-18',1);
insert into #t values('2023-01-19',1);
insert into #t values('2023-01-20',1);
insert into #t values('2023-01-21',0);
insert into #t values('2023-01-22',0);
insert into #t values('2023-01-23',1);
insert into #t values('2023-01-24',1);
go
CodePudding user response:
FROM #t t_from, #t t_to
where t_from.start_date < t_to.start_date
is a "triangular" join. It is not quite as bad as a cross join but getting that way (rows returned are N*(N-1)/2
rather than N*N
).
This will not scale with large numbers of rows in #t
.
One way of getting your desired results (db fiddle) is
WITH Dates
AS (SELECT *,
sum(working_days)
OVER (
ORDER BY start_date) AS working_day_count
FROM #t)
SELECT D1.start_date,
MAX(D2.start_date)
FROM Dates D1
JOIN Dates D2
ON D1.working_day_count 5 - D1.working_days = D2.working_day_count
GROUP BY D1.start_date
This calculates the running total efficiently. Potentially a solution will be provided that does it all in one pass rather than requiring the self join above but this is at least an equi join and should be a lot faster in your 1,000 row case than your current method.
CodePudding user response:
Out of curiosity, look at the solution without JOIN
with cte as (
select start_date,working_days
,lead(start_date,1)over(order by start_date ) d1
,lead(start_date,4)over(order by start_date ) d2 -- target date
,lead(start_date,5)over(order by start_date ) d3 --target 1, if not_working_days between d2 and d3
from (select * from #t where working_days=1) t -- dates, only working_days
)
,t2 as(
select *
,case when datediff(d,d2,d3)>1 then dateadd(d,-1,d3)
else d2
end end_date
,datediff(d,start_date,d1) dn
from cte
)
select
dateadd(d,isnull(n,0),start_date) start_date
,case when isnull(n,0)=0 then working_days else 0 end working_days
,case when isnull(n,0)=0 then end_date else dateadd(d,1,end_date) end end_date
from t2 left join (values(0),(1),(2),(3),(4),(5))nn(n) --to restore not_working_days
on nn.n<t2.dn
If there is an opportunity to compare the cost, it will be interesting.
Test example
CodePudding user response:
with data as (
select start_date as dt, working_days as adj,
sum(cast(working_days as int)) over (order by start_date) as ofs
from #t
)
select ds.dt as start_date, de.dt as end_date into #d3
from data ds cross apply (
select max(dt) as end_date
from data de
where de.ofs = ds.ofs 5 - ds.adj
) de(dt)
where de.dt is not null;
Basically the same as above but cross apply might be improvement.
Or you could just search via lead()
:
with data as (
select start_date as dt, working_days as adj,
sum(cast(working_days as int)) over (order by start_date) as ofs
from #t
), data2 as (
select dt as start_date,
case ofs 5 - adj -- check in reverse order
when lead(ofs, 9) over (order by dt) then lead(dt, 9) over (order by dt)
when lead(ofs, 8) over (order by dt) then lead(dt, 8) over (order by dt)
when lead(ofs, 7) over (order by dt) then lead(dt, 7) over (order by dt)
when lead(ofs, 6) over (order by dt) then lead(dt, 6) over (order by dt)
end as end_date
from data
)
select * into #d4
from data2
where end_date is not null;
This query assumes at least two days off per week and a maximum of four-day weekends to limit the number of dates that need to be searched. Expand as necessary.
Check out the fiddle here with a demonstration that both of these approaches seem to generate cheaper plans: https://dbfiddle.uk/QzhL50K-
There is a way to use a single pass.
select start_date,
dateadd(day,
case 5
when sum(working_days) over (order by start_date rows between current row and 9 following) then 9
when sum(working_days) over (order by start_date rows between current row and 8 following) then 8
when sum(working_days) over (order by start_date rows between current row and 7 following) then 7
when sum(working_days) over (order by start_date rows between current row and 6 following) then 6
end,
start_date) as end_date
from #t;
This doesn't handle removal of dates near the end of the table within the five-day horizon.