SQL: Capturing first row in WHILE loop-CodePudding

I'm thinking my issue is based on how I've written the loop, but how it is written is how I've come to understand loops. So I'm really wondering if there is a way to consolidate this? As it sits, I have 2 sections: the first which captures the very first row, and it exists solely because my second section will capture the 2nd row up through the max row (as determined by @I).

Is there a way to get the first row inside the loop?

DECLARE @COUNTER TINYINT = 15

DECLARE @EndDate DATE= CAST(CONCAT(YEAR(GETDATE()), '-', MONTH(GETDATE()), '-01') AS DATE)
DECLARE @StartDate DATE= DATEADD(MONTH, -12, @EndDate)


--THIS SECTION EXISTS SO I CAN CAPTURE THE FIRST ROW

       SELECT count = COUNT(DISTINCT value)
       FROM [TABLE]
       WHERE DATE >= @StartDate
             AND DATE < @EndDate


--THE LOOP BELOW CAPTURES THE SECOND ROW THROUGH THE REST OF THE ROWS BASED ON THE UPPER THRESHOLD OF @I

DECLARE @I TINYINT
SET @I = 0
WHILE @I <= @COUNTER
    BEGIN

        SET @I = @I   1
               SELECT count = COUNT(DISTINCT value)
               FROM [TABLE]
               WHERE DATE >= DATEADD(MONTH, -(@I), @StartDate)
                     AND DATE < DATEADD(MONTH, -(@I), @EndDate)
    END

EDIT 1:

Given the interest in the approach here, I thought I'd try to explain why I went with a loop as opposed to a set based query.

So here is my actual query:

DECLARE @EndDate DATE= CAST(CONCAT(YEAR(GETDATE()), '-', MONTH(GETDATE()), '-01') AS DATE)
DECLARE @StartDate DATE= DATEADD(MONTH, -12, @EndDate)


       SELECT count = COUNT(DISTINCT o.ClinicLocationId)
       FROM [order].package p WITH(NOLOCK)
            INNER JOIN [order].[order] o WITH(NOLOCK) ON o.packageid = p.packageid
            INNER JOIN Profile.ClinicLocationInfo cli WITH(NOLOCK) ON cli.LocationId = o.ClinicLocationId
                AND cli.FacilityType IN('CLINIC', 'HOSPITAL')
       WHERE CAST(p.ShipDTM AS DATE) >= @StartDate
             AND CAST(p.ShipDTM AS DATE) < @EndDate
             AND p.isshipped = 1
             AND o.IsShipped = 1
             AND ISNULL(o.iscanceled, 0) = 0

This gives me a count of 1670, which I know is correct because I have an older report with which to compare output. So when I add a date column to the SELECT statement, which is then also added to the GROUP BY, I get a list of numbers. You would think that by simply tallying the count column within those date ranges, you'd get the same value. But that is not what happens here. For just the first row, where I'd expect a tally of 1670, I'm actually getting 3956. I believe this is because of how Active is being calculated.

Active is determined by a rolling 12 month range. So for example, as of 7/1/2022 (with a starting date of 7/1/2021), there are 1670 locations. If I wanted to look to see how many Active locations there were as of 6/1/2022, I'd have to subtract a month from my @Start and @End to attain that rolling 12 month block. This is why I went with a loop, it seemed much easier to get my results this way. I just verified it takes 7 seconds to run for a 15-month span.

So given this further explanation, I'm curious if there would be a set-based solution for this? I did try the answer provided by Joel, but it did not produce correct numbers (understandable as he did not have more information that's now provided).

CodePudding user response：

One option is moving the SET @I = @I 1 line to after the rest of the loop body (and then also run for one iteration longer). In this way, the first adjustment for the dates is still 0. But don't do this.

I'm thinking my issue is based on how I've written the loop

It's not in how you've written the loop, but that a loop was written at all. Nearly every case where you want to use a loop in SQL there is a set-based alternative that is vastly more efficient... usually multiple orders of magnitude. This is no exception. Six-ten seconds is an eternity for a process like this; no reason it shouldn't finish almost instantly.

The code for that will look something like this:

WITH
    -- generate numbers
    L0   AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1
    L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2
    L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4
    L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3),
    -- project numbers as start and end dates
    Dates As (SELECT TOP 17 
                  DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate, 
                  DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate 
              FROM Nums ORDER BY n)

SELECT d.StartDate, COUNT(DISTINCT value) as [count]
FROM [TABLE] t
-- use the dates to filter the table
INNER JOIN Dates d on t.[Date] > = d.StartDate and t.[Date] < d.EndDate
GROUP BY d.StartDate

Or I can show this as actually runnable code:

WITH
    L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
    L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3),
    Dates As (SELECT top 17 
                 DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate, 
                 DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate 
              FROM Nums ORDER BY n)

SELECT StartDate, EndDate
FROM  Dates d

We can see this gives 17 results with same start and end values as a modified version of the original code in the question.

Update:

Now that we have the full original code, I can adapt my answer to use it:

WITH
    L0   AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1
    L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2
    L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4
    L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3),
    Dates As (SELECT TOP 17 
                  DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate, 
                  DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate 
              FROM Nums ORDER BY n
)
SELECT d.StartDate, COUNT(DISTINCT o.ClinicLocationId) As [count]
FROM [order].package p
INNER JOIN [order].[order] o ON o.packageid = p.packageid
INNER JOIN Profile.ClinicLocationInfo cli ON cli.LocationId = o.ClinicLocationId
    AND cli.FacilityType IN ('CLINIC', 'HOSPITAL')
-- PLEASE tell me ShipDTM is a datetime value and not a varchar
INNER JOIN Dates d ON d.StartDate <= p.ShipDTM and p.ShipDTM < d.EndDate
WHERE p.IsShipped = 1 and o.IsShipped = 1 o.IsCanceled IS NULL
GROUP BY d.StartDate

Alternatively, if this still somehow gives you the wrong results (I think the GROUP BY will have fixed it), you can use an APPLY instead, like so (the JOIN/GROUP BY should still be faster):

WITH
    L0   AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1
    L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2
    L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4
    L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3),
    Dates As (SELECT TOP 17 
                  DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate, 
                  DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate 
              FROM Nums ORDER BY n
)
SELECT d.StartDate, counts.[count]
FROM Dates d
CROSS APPLY (
    SELECT count = COUNT(DISTINCT o.ClinicLocationId)
    FROM [order].package p 
    INNER JOIN [order].[order] o ON o.packageid = p.packageid
    INNER JOIN Profile.ClinicLocationInfo cli ON cli.LocationId = o.ClinicLocationId
        AND cli.FacilityType IN('CLINIC', 'HOSPITAL')
    WHERE p.ShipDTM >= d.StartDate
        AND p.ShipDTM < d.EndDate
        AND p.isshipped = 1
        AND o.IsShipped = 1
        AND IsCanceled IS NOT NULL
) counts

One final note here, regarding the ShipDTM column. I know you may not have any control over this, but the CAST() around that column makes it look like it's a varchar or similar. If it is, you should see if you can fix it, and I say "fix" because the schema really is considered broken.

As it is, you're likely converting every row in the table to a Date value — even rows you don't need. Thanks to internationalization issues, these conversion are not the simple or fast process you might expect; in fact converting between either date or numeric and string values is always something to avoid as much as possible. They also invalidate any index you might have on the column. Even worse, you are repeating these conversions for each iteration! No wonder the query runs for multiple seconds!

Almost much as the loop, these conversions are likely the source of the slowness. The good news is the the JOIN GROUP BY version of my solution should at least get you back to only needing to convert these values once. Fixing the column (because again: it is broken) will get yet another speed boost. I do understand this is likely to be either above your pay grade or a vendor system you can't change, but you should at least bring up the issue with someone who can influence this: either an architect/senior dev or the vendor directly.