Sum unique day count within a date ranges-CodePudding

I have a list of duplicate date ranges where I need to sum all dates unique to a campaign dates, without duplicating overlapping days.

For example in the data below, there are days in campaign_a running between 08/07/2022 and 15/07/2022 which overlap the 11/07/2022 and 13/07/2022 which need to not duplicate when summing.

The summing also needs to take the campaign name as a conditional.

campaign	days	line_start	line_end
campaign_a	108	14/07/2022	30/10/2022
campaign_a	61	31/10/2022	31/12/2022
campaign_a	2	11/07/2022	13/07/2022
campaign_a	2	8/07/2022	15/07/2022
campaign_a	108	14/07/2022	30/10/2022
campaign_a	61	31/10/2022	31/12/2022
campaign_a	2	11/07/2022	13/07/2022
campaign_a	2	8/07/2022	10/07/2022
campaign_b	108	14/07/2022	30/10/2022
campaign_b	61	31/10/2022	31/12/2022
campaign_b	2	11/07/2022	13/07/2022
campaign_b	2	8/07/2022	10/07/2022
campaign_b	108	14/07/2022	30/10/2022
campaign_b	61	31/10/2022	31/12/2022
campaign_b	2	11/07/2022	13/07/2022
campaign_b	2	8/07/2022	10/07/2022
campaign_b	108	14/07/2022	30/10/2022
campaign_b	61	31/10/2022	31/12/2022
campaign_b	2	11/07/2022	13/07/2022
campaign_b	2	8/07/2022	10/07/2022

CodePudding user response：

Again, assuming "campaign_a", for example, in G2:

=LET(
    δ, A$2:D$21,
    ζ, FILTER(δ, INDEX(δ, , 1) = G2),
    α, INDEX(ζ, , 3),
    β, INDEX(ζ, , 4),
    ξ, SEQUENCE(MAX(β) - MIN(α)   1, , MIN(α)),
    γ, BYROW(ξ, LAMBDA(λ, SUM(BYROW(CHOOSE({1, 2}, α, β), 
       LAMBDA(κ, N(MEDIAN(κ, λ) = λ)))))),
       COUNT(FILTER(ξ, γ > 0))
)

Copy down to give similar results for campaigns in G3, G4, etc.

CodePudding user response：

Basically, this problem is similar to calculating the project's total duration (in days). Where in this case each campaign's data can be assimilated as project tasks with start and end dates. The only difference here is that gaps don't contribute to project duration.

Let's illustrate it with the following example, since dates are represented in Excel as a positive integer, to simplify the calculation we can use just numbers as follow:

Grom the graph is easy to calculate that the total duration will be 13 days. Notice that the gap in columns K and P doesn't count for the total duration.

Based on this graphical representation we can build a logic for calculating the durations as we progress from the top of the Gantt chart to the bottom.

The logic to calculate the duration is based on identifying overlapping duration for each task. The condition to identify two intervals A,B overlap is the following:

AND(startA <= endB, endA >= startB)

we are going to iterate over all tasks and on each step update the new start and end dates and update the duration. Let's say we are going to calculate the new start and end dates for two consecutive rows represented by A and B intervals. The combined interval will be:

[MIN(startA, startB), MAX(endA, endB)]

If the previous duration is represented by the variable duration, then the updated duration will be:

IF(endB > endA, duration   (endB - endA), duration)))

i.e, if the next interval (B) ends after A, then we need to update the duration with the difference (endB - endA).

If the intervals don't overlap, then the duration will be updated as follow:

duration = duration   endB - startB   1

and the next start and end date will be the startB, endB for the next calculation.

The above process can now be put in Excel terms as follow:

=LET(set, $A$2:$C$21, campaing, E2, campaings, INDEX(set,,1),
 subset, SORT(FILTER(set, campaings=campaing),2),starts, INDEX(subset,,2),
 ends, INDEX(subset,,3),intervals, starts&","&ends,
 SPLIT, LAMBDA(x, 1*TEXTSPLIT(x,",")), JOIN, LAMBDA(y, TEXTJOIN(",",,y)),
 calc, REDUCE("0,0,0", intervals, LAMBDA(acc,i, LET(accValues, SPLIT(acc),
   startA, INDEX(accValues,1,1), endA, INDEX(accValues,,2),
   duration, INDEX(accValues,1,3), values, SPLIT(i),
   startB, INDEX(values,1,1),  endB, INDEX(values,,2),
   IF(duration=0,JOIN(HSTACK(startB, endB, endB-startB 1)),
    IF(AND(startA <= endB, endA >= startB),
      JOIN(HSTACK(MIN(startA, startB), MAX(endA, endB),
        IF(endB > endA, duration   (endB-endA), duration))),
      JOIN(HSTACK(startB, endB, duration   endB-startB 1))
    ))
 ))), INDEX(SPLIT(calc),,3)
)

and here is the output:

and here is the output for both campaigns based on the input data from the question: