Home > Software design >  Grouping rows by fixed integer window that resets above a threshold
Grouping rows by fixed integer window that resets above a threshold

Time:05-17

I am trying to group rows together using a counter of elapsed days from the previous row. These groups would represent a period of 180 elapsed days and no more. If a new row would break this rule (i.e. be >180 days past the first row in the group), it becomes the new first row in a group and a new 180 day starts from it. And to add one more layer, we're also grouping across customers, so .

For example, given the following data:

elapsedDaysSinceLastRow customer rowDate
0 X \\\
90 X \\\
120 X \\\
100 X \\\
100 3 X
0 Y \\\
45 Y \\\
120 Y \\\
150 Y \\\
400 Y \\\

We'd want our SQL to generate the following grouping column:

elapsedDaysSinceLastRow grouping customer rowDate
0 1 X \\\
90 1 X \\\
120 2 X \\\
100 2 X \\\
100 3 X \\\
0 1 Y \\\
45 1 Y \\\
120 1 Y \\\
150 2 Y \\\
400 3 Y \\\

So far, we haven't been able to figure out how to do this with windowing functions. It seems like only recursion has worked.

Unfortunately, the recursion is extremely slow and won't work for our use case.

Are there any recommendations for how we could change the structure of our data to support a window-function approach or how we could speed up our recursive query?

Our recursive query is as follows:

daysOngoing AS (
(
SELECT *
,1 AS firstIn180Days
FROM tblAbove i
WHERE transactionNumber = 1
)
UNION ALL
(
SELECT i.*
,CASE WHEN (CASE WHEN d.firstIn180Days = 1 AND (i.elapsedDaysSinceLastRow > 180) THEN 0
WHEN d.firstIn180Days = 0 AND (d.elapsedDaysSinceLastRow   i.elapsedDaysSinceLastRow > 180) THEN 0
WHEN d.firstIn180Days = 1 AND (i.elapsedDaysSinceLastRow < 180) THEN i.elapsedDaysSinceLastRow
WHEN d.firstIn180Days = 0 AND (d.elapsedDaysSinceLastRow   i.elapsedDaysSinceLastRow < 180) THEN d.elapsedDaysSinceLastRow   i.elapsedDaysSinceLastRow
ELSE 0
END) > 0 THEN 0
WHEN d.rowDate = i.rowDate THEN 0
ELSE 1
END AS firstIn180Days
FROM tblAbove i
JOIN daysOngoing d
ON i.journeyId = d.journeyId AND i.transactionNumber = d.transactionNumber   1
)
)
SELECT elapsedDaysSinceLastRow
,SUM(firstIn180Days) OVER (PARTITION BY customer ORDER BY customer, rowDate ASC) grouping
,customer
,rowDate
FROM daysOngoing

CodePudding user response:

The following provides your desired results for the sample data, making the assumption rowdate can be used for ordering; does this work for you?

with m as (
    select *,
      case when Sum(elapsedDaysSinceLastRow)
          over(partition by customer order by rowdate) % 180 < elapsedDaysSinceLastRow 
        then 1 else 0 
      end rs
    from t
)
select elapsedDaysSinceLastRow, 
  Sum(rs)over(partition by customer order by rowdate)   1 [grouping], 
  customer, rowdate
from m;

CodePudding user response:

One option could be to use the ROWS UNBOUNDED PRECEDING frame specification inside a SUM window function over the field elapsedDaysSinceLastRow and partitioning on the customer field. Once you get the cumulative sum, you can divide by 180 and add 1 to get the increasing value you need.

SELECT elapsedDaysSinceLastRow,
       (SUM(elapsedDaysSinceLastRow) OVER(PARTITION BY customer 
                                          ORDER     BY customer
                                          ROWS UNBOUNDED PRECEDING))/180 1, 
       customer,
       rowDate
FROM tab

Quite efficient, though it comes with a drawback (previously pointed in the comments): since we divide by 180, if the difference between one row and the consecutive is bigger than 180 (as it happens for customer 2 on last row), you will get skipped values for this grouping calculated field (1>1>1>2>4 instead of 1>1>1>2>3).

Try it here.

  • Related