SQL Server - Get row based on a sum-CodePudding

I need to calculate 5 working days from a given date, based on the table below.

5 working days from Jan 9 is Jan 16 because the sum of the working_days column below between those dates is 5.

Here is SQL that I used.

WITH dates AS
(
    SELECT t_from.start_date, t_to.start_date end_date
    FROM #t t_from, #t t_to
    WHERE t_from.start_date < t_to.start_date 
),
sum_days AS 
(
    SELECT
        start_date, end_date,
        (SELECT SUM(t_sum.working_days) 
         FROM #t t_sum 
         WHERE t_sum.start_date BETWEEN d.start_date AND d.end_date) tot_days
    FROM
        dates d
)
SELECT
    start_date, MAX(end_date) end_date
FROM
    sum_days
WHERE
    tot_days = 5
GROUP BY
    start_date

It works, but it is inefficient. The real table that I'm using has 1,000 rows, and it takes over 1 minute for the query to return.

My question: is there a better way?

Input:

start_date	working_days
2023-01-09	1
2023-01-10	1
2023-01-11	1
2023-01-12	1
2023-01-13	1
2023-01-14	0
2023-01-15	0
2023-01-16	0
2023-01-17	1
2023-01-18	1
2023-01-19	1
2023-01-20	1
2023-01-21	0
2023-01-22	0
2023-01-23	1
2023-01-24	1

Desired output:

start_date	end_date
2023-01-09	2023-01-16
2023-01-10	2023-01-17
2023-01-11	2023-01-18
2023-01-12	2023-01-19
2023-01-13	2023-01-22
2023-01-14	2023-01-23
2023-01-15	2023-01-23
2023-01-16	2023-01-23
2023-01-17	2023-01-23
2023-01-18	2023-01-24

SQL to create the table:

drop table if exists #t;
GO

select '2023-01-09' start_date,1 working_days into #t;
GO

insert into #t values('2023-01-10',1) ;
go 

insert into #t values('2023-01-11',1);
insert into #t values('2023-01-12',1);
insert into #t values('2023-01-13',1);
insert into #t values('2023-01-14',0);
insert into #t values('2023-01-15',0);
insert into #t values('2023-01-16',0);
insert into #t values('2023-01-17',1);
insert into #t values('2023-01-18',1);
insert into #t values('2023-01-19',1);
insert into #t values('2023-01-20',1);
insert into #t values('2023-01-21',0);
insert into #t values('2023-01-22',0);
insert into #t values('2023-01-23',1);
insert into #t values('2023-01-24',1);
go

CodePudding user response：

FROM #t t_from, #t t_to
where t_from.start_date < t_to.start_date

is a "triangular" join. It is not quite as bad as a cross join but getting that way (rows returned are N*(N-1)/2 rather than N*N).

This will not scale with large numbers of rows in #t.

One way of getting your desired results (db fiddle) is

WITH Dates
     AS (SELECT *,
                sum(working_days)
                  OVER (
                    ORDER BY start_date) AS working_day_count
         FROM   #t)
SELECT D1.start_date,
       MAX(D2.start_date)
FROM   Dates D1
       JOIN Dates D2
         ON D1.working_day_count   5 - D1.working_days = D2.working_day_count
GROUP  BY D1.start_date

This calculates the running total efficiently. Potentially a solution will be provided that does it all in one pass rather than requiring the self join above but this is at least an equi join and should be a lot faster in your 1,000 row case than your current method.

CodePudding user response：

Out of curiosity, look at the solution without JOIN

with cte as (
select start_date,working_days
  ,lead(start_date,1)over(order by start_date ) d1
  ,lead(start_date,4)over(order by start_date ) d2 -- target date
  ,lead(start_date,5)over(order by start_date ) d3 --target 1, if not_working_days between d2 and d3
from (select * from #t where working_days=1) t -- dates, only working_days
)
,t2 as(
select * 
 ,case when datediff(d,d2,d3)>1 then dateadd(d,-1,d3)
       else d2
  end end_date
 ,datediff(d,start_date,d1) dn
from cte
)
select  
    dateadd(d,isnull(n,0),start_date) start_date
   ,case when isnull(n,0)=0 then working_days else 0 end working_days
   ,case when isnull(n,0)=0 then end_date else dateadd(d,1,end_date) end end_date
from t2 left join (values(0),(1),(2),(3),(4),(5))nn(n) --to restore not_working_days
      on nn.n<t2.dn

If there is an opportunity to compare the cost, it will be interesting.
Test example

CodePudding user response：

with data as (
    select start_date as dt, working_days as adj,
        sum(cast(working_days as int)) over (order by start_date) as ofs
    from #t
)
select ds.dt as start_date, de.dt as end_date into #d3
from data ds cross apply (
    select max(dt) as end_date
    from data de
    where de.ofs = ds.ofs   5 - ds.adj
) de(dt)
where de.dt is not null;

Basically the same as above but cross apply might be improvement.

Or you could just search via lead():

with data as (
    select start_date as dt, working_days as adj,
        sum(cast(working_days as int)) over (order by start_date) as ofs
    from #t
), data2 as (
    select dt as start_date,
        case ofs   5 - adj -- check in reverse order
            when lead(ofs, 9) over (order by dt) then lead(dt,  9) over (order by dt)
            when lead(ofs, 8) over (order by dt) then lead(dt,  8) over (order by dt)
            when lead(ofs, 7) over (order by dt) then lead(dt,  7) over (order by dt)
            when lead(ofs, 6) over (order by dt) then lead(dt,  6) over (order by dt) 
        end as end_date
    from data
)
select * into #d4
from data2
where end_date is not null;

This query assumes at least two days off per week and a maximum of four-day weekends to limit the number of dates that need to be searched. Expand as necessary.

Check out the fiddle here with a demonstration that both of these approaches seem to generate cheaper plans: https://dbfiddle.uk/QzhL50K-

There is a way to use a single pass.

select start_date,
    dateadd(day,
        case 5
            when sum(working_days) over (order by start_date rows between current row and 9 following) then 9
            when sum(working_days) over (order by start_date rows between current row and 8 following) then 8
            when sum(working_days) over (order by start_date rows between current row and 7 following) then 7
            when sum(working_days) over (order by start_date rows between current row and 6 following) then 6
         end,
         start_date) as end_date
from #t;

This doesn't handle removal of dates near the end of the table within the five-day horizon.