I am trying to group rows together using a counter of elapsed days from the previous row. These groups would represent a period of 180 elapsed days and no more. If a new row would break this rule (i.e. be >180 days past the first row in the group), it becomes the new first row in a group and a new 180 day starts from it. And to add one more layer, we're also grouping across customers, so .
For example, given the following data:
elapsedDaysSinceLastRow | customer | rowDate |
---|---|---|
0 | X | \\\ |
90 | X | \\\ |
120 | X | \\\ |
100 | X | \\\ |
100 | 3 | X |
0 | Y | \\\ |
45 | Y | \\\ |
120 | Y | \\\ |
150 | Y | \\\ |
400 | Y | \\\ |
We'd want our SQL to generate the following grouping column:
elapsedDaysSinceLastRow | grouping | customer | rowDate |
---|---|---|---|
0 | 1 | X | \\\ |
90 | 1 | X | \\\ |
120 | 2 | X | \\\ |
100 | 2 | X | \\\ |
100 | 3 | X | \\\ |
0 | 1 | Y | \\\ |
45 | 1 | Y | \\\ |
120 | 1 | Y | \\\ |
150 | 2 | Y | \\\ |
400 | 3 | Y | \\\ |
So far, we haven't been able to figure out how to do this with windowing functions. It seems like only recursion has worked.
Unfortunately, the recursion is extremely slow and won't work for our use case.
Are there any recommendations for how we could change the structure of our data to support a window-function approach or how we could speed up our recursive query?
Our recursive query is as follows:
daysOngoing AS (
(
SELECT *
,1 AS firstIn180Days
FROM tblAbove i
WHERE transactionNumber = 1
)
UNION ALL
(
SELECT i.*
,CASE WHEN (CASE WHEN d.firstIn180Days = 1 AND (i.elapsedDaysSinceLastRow > 180) THEN 0
WHEN d.firstIn180Days = 0 AND (d.elapsedDaysSinceLastRow i.elapsedDaysSinceLastRow > 180) THEN 0
WHEN d.firstIn180Days = 1 AND (i.elapsedDaysSinceLastRow < 180) THEN i.elapsedDaysSinceLastRow
WHEN d.firstIn180Days = 0 AND (d.elapsedDaysSinceLastRow i.elapsedDaysSinceLastRow < 180) THEN d.elapsedDaysSinceLastRow i.elapsedDaysSinceLastRow
ELSE 0
END) > 0 THEN 0
WHEN d.rowDate = i.rowDate THEN 0
ELSE 1
END AS firstIn180Days
FROM tblAbove i
JOIN daysOngoing d
ON i.journeyId = d.journeyId AND i.transactionNumber = d.transactionNumber 1
)
)
SELECT elapsedDaysSinceLastRow
,SUM(firstIn180Days) OVER (PARTITION BY customer ORDER BY customer, rowDate ASC) grouping
,customer
,rowDate
FROM daysOngoing
CodePudding user response:
The following provides your desired results for the sample data, making the assumption rowdate
can be used for ordering; does this work for you?
with m as (
select *,
case when Sum(elapsedDaysSinceLastRow)
over(partition by customer order by rowdate) % 180 < elapsedDaysSinceLastRow
then 1 else 0
end rs
from t
)
select elapsedDaysSinceLastRow,
Sum(rs)over(partition by customer order by rowdate) 1 [grouping],
customer, rowdate
from m;
CodePudding user response:
One option could be to use the ROWS UNBOUNDED PRECEDING
frame specification inside a SUM
window function over the field elapsedDaysSinceLastRow
and partitioning on the customer
field. Once you get the cumulative sum, you can divide by 180 and add 1 to get the increasing value you need.
SELECT elapsedDaysSinceLastRow,
(SUM(elapsedDaysSinceLastRow) OVER(PARTITION BY customer
ORDER BY customer
ROWS UNBOUNDED PRECEDING))/180 1,
customer,
rowDate
FROM tab
Quite efficient, though it comes with a drawback (previously pointed in the comments): since we divide by 180, if the difference between one row and the consecutive is bigger than 180 (as it happens for customer 2 on last row), you will get skipped values for this grouping
calculated field (1>1>1>2>4 instead of 1>1>1>2>3).
Try it here.