I'm thinking my issue is based on how I've written the loop, but how it is written is how I've come to understand loops. So I'm really wondering if there is a way to consolidate this? As it sits, I have 2 sections: the first which captures the very first row, and it exists solely because my second section will capture the 2nd row up through the max row (as determined by @I
).
Is there a way to get the first row inside the loop?
DECLARE @COUNTER TINYINT = 15
DECLARE @EndDate DATE= CAST(CONCAT(YEAR(GETDATE()), '-', MONTH(GETDATE()), '-01') AS DATE)
DECLARE @StartDate DATE= DATEADD(MONTH, -12, @EndDate)
--THIS SECTION EXISTS SO I CAN CAPTURE THE FIRST ROW
SELECT count = COUNT(DISTINCT value)
FROM [TABLE]
WHERE DATE >= @StartDate
AND DATE < @EndDate
--THE LOOP BELOW CAPTURES THE SECOND ROW THROUGH THE REST OF THE ROWS BASED ON THE UPPER THRESHOLD OF @I
DECLARE @I TINYINT
SET @I = 0
WHILE @I <= @COUNTER
BEGIN
SET @I = @I 1
SELECT count = COUNT(DISTINCT value)
FROM [TABLE]
WHERE DATE >= DATEADD(MONTH, -(@I), @StartDate)
AND DATE < DATEADD(MONTH, -(@I), @EndDate)
END
EDIT 1:
Given the interest in the approach here, I thought I'd try to explain why I went with a loop as opposed to a set based query.
So here is my actual query:
DECLARE @EndDate DATE= CAST(CONCAT(YEAR(GETDATE()), '-', MONTH(GETDATE()), '-01') AS DATE)
DECLARE @StartDate DATE= DATEADD(MONTH, -12, @EndDate)
SELECT count = COUNT(DISTINCT o.ClinicLocationId)
FROM [order].package p WITH(NOLOCK)
INNER JOIN [order].[order] o WITH(NOLOCK) ON o.packageid = p.packageid
INNER JOIN Profile.ClinicLocationInfo cli WITH(NOLOCK) ON cli.LocationId = o.ClinicLocationId
AND cli.FacilityType IN('CLINIC', 'HOSPITAL')
WHERE CAST(p.ShipDTM AS DATE) >= @StartDate
AND CAST(p.ShipDTM AS DATE) < @EndDate
AND p.isshipped = 1
AND o.IsShipped = 1
AND ISNULL(o.iscanceled, 0) = 0
This gives me a count of 1670, which I know is correct because I have an older report with which to compare output. So when I add a date column to the SELECT
statement, which is then also added to the GROUP BY
, I get a list of numbers. You would think that by simply tallying the count column within those date ranges, you'd get the same value. But that is not what happens here. For just the first row, where I'd expect a tally of 1670, I'm actually getting 3956. I believe this is because of how Active is being calculated.
Active is determined by a rolling 12 month range. So for example, as of 7/1/2022 (with a starting date of 7/1/2021), there are 1670 locations. If I wanted to look to see how many Active locations there were as of 6/1/2022, I'd have to subtract a month from my @Start and @End to attain that rolling 12 month block. This is why I went with a loop, it seemed much easier to get my results this way. I just verified it takes 7 seconds to run for a 15-month span.
So given this further explanation, I'm curious if there would be a set-based solution for this? I did try the answer provided by Joel, but it did not produce correct numbers (understandable as he did not have more information that's now provided).
CodePudding user response:
One option is moving the SET @I = @I 1
line to after the rest of the loop body (and then also run for one iteration longer). In this way, the first adjustment for the dates is still 0. But don't do this.
I'm thinking my issue is based on how I've written the loop
It's not in how you've written the loop, but that a loop was written at all. Nearly every case where you want to use a loop in SQL there is a set-based alternative that is vastly more efficient... usually multiple orders of magnitude. This is no exception. Six-ten seconds is an eternity for a process like this; no reason it shouldn't finish almost instantly.
The code for that will look something like this:
WITH
-- generate numbers
L0 AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3),
-- project numbers as start and end dates
Dates As (SELECT TOP 17
DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate,
DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate
FROM Nums ORDER BY n)
SELECT d.StartDate, COUNT(DISTINCT value) as [count]
FROM [TABLE] t
-- use the dates to filter the table
INNER JOIN Dates d on t.[Date] > = d.StartDate and t.[Date] < d.EndDate
GROUP BY d.StartDate
Or I can show this as actually runnable code:
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3),
Dates As (SELECT top 17
DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate,
DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate
FROM Nums ORDER BY n)
SELECT StartDate, EndDate
FROM Dates d
We can see this gives 17 results with same start and end values as a modified version of the original code in the question.
Update:
Now that we have the full original code, I can adapt my answer to use it:
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3),
Dates As (SELECT TOP 17
DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate,
DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate
FROM Nums ORDER BY n
)
SELECT d.StartDate, COUNT(DISTINCT o.ClinicLocationId) As [count]
FROM [order].package p
INNER JOIN [order].[order] o ON o.packageid = p.packageid
INNER JOIN Profile.ClinicLocationInfo cli ON cli.LocationId = o.ClinicLocationId
AND cli.FacilityType IN ('CLINIC', 'HOSPITAL')
-- PLEASE tell me ShipDTM is a datetime value and not a varchar
INNER JOIN Dates d ON d.StartDate <= p.ShipDTM and p.ShipDTM < d.EndDate
WHERE p.IsShipped = 1 and o.IsShipped = 1 o.IsCanceled IS NULL
GROUP BY d.StartDate
Alternatively, if this still somehow gives you the wrong results (I think the GROUP BY
will have fixed it), you can use an APPLY
instead, like so (the JOIN
/GROUP BY
should still be faster):
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3),
Dates As (SELECT TOP 17
DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate,
DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate
FROM Nums ORDER BY n
)
SELECT d.StartDate, counts.[count]
FROM Dates d
CROSS APPLY (
SELECT count = COUNT(DISTINCT o.ClinicLocationId)
FROM [order].package p
INNER JOIN [order].[order] o ON o.packageid = p.packageid
INNER JOIN Profile.ClinicLocationInfo cli ON cli.LocationId = o.ClinicLocationId
AND cli.FacilityType IN('CLINIC', 'HOSPITAL')
WHERE p.ShipDTM >= d.StartDate
AND p.ShipDTM < d.EndDate
AND p.isshipped = 1
AND o.IsShipped = 1
AND IsCanceled IS NOT NULL
) counts
One final note here, regarding the ShipDTM
column. I know you may not have any control over this, but the CAST()
around that column makes it look like it's a varchar
or similar. If it is, you should see if you can fix it, and I say "fix" because the schema really is considered broken.
As it is, you're likely converting every row in the table to a Date value — even rows you don't need. Thanks to internationalization issues, these conversion are not the simple or fast process you might expect; in fact converting between either date or numeric and string values is always something to avoid as much as possible. They also invalidate any index you might have on the column. Even worse, you are repeating these conversions for each iteration! No wonder the query runs for multiple seconds!
Almost much as the loop, these conversions are likely the source of the slowness. The good news is the the JOIN
GROUP BY
version of my solution should at least get you back to only needing to convert these values once. Fixing the column (because again: it is broken) will get yet another speed boost. I do understand this is likely to be either above your pay grade or a vendor system you can't change, but you should at least bring up the issue with someone who can influence this: either an architect/senior dev or the vendor directly.