I have a table that contains 3 columns: personID, weeknumber, and event. Event is 0 if there was no event for that person in that week and 1 if there was.
I need to create a new column weekssincelastevent which will be 0 for the week where event=1 and then 1,2,3,4 etc for the weeks afterwards. If there is a later event then it starts from 0 again. E.g.
personID | weeknumber | event | weekssincelastevent |
---|---|---|---|
1 | 1 | 0 | NULL |
1 | 2 | 0 | NULL |
1 | 3 | 1 | 0 |
1 | 4 | 0 | 1 |
1 | 5 | 0 | 2 |
1 | 6 | 0 | 3 |
2 | 1 | 0 | NULL |
2 | 2 | 1 | 0 |
2 | 3 | 0 | 1 |
2 | 4 | 1 | 0 |
2 | 5 | 0 | 1 |
The column should be NULL before the first events and all values NULL where a personID never has event.
I can't think how to write this in SQL.
The table has ~600m rows (60m personIDs with 100 weeknumbers each, although some personIDs don't have all the weeknumbers).
Many thanks for any insight.
CodePudding user response:
This is a bit of a gaps and island problem here. The first part, in the CTE, puts the data into "groups". Each time there is an event that's a new group. it also calculates the number of weeks that past since the prior week (which is set to 0
for rows hosting an event). Then in the outer query we SUM
the number of weeks past in each group, giving the number of weeks that have passed:
WITH Groups AS(
SELECT PersonID,
WeekNumber,
Event,
COUNT(CASE Event WHEN 1 THEN 1 END) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Events,
CASE Event WHEN 0 THEN WeekNumber - LAG(WeekNumber) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC) ELSE 0 END AS WeeksPassed
FROM dbo.YourTable)
SELECT PersonID,
WeekNumber,
Event,
CASE WHEN Events = 0 THEN NULL
ELSE SUM(WeeksPassed) OVER (PARTITION BY PersonID, Events ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
END AS WeekSinceLastEvent
FROM Groups;
CodePudding user response:
You can do this with a conditional aggregate within a windowed function:
SELECT t.PersonID,
t.WeekNumber,
t.Event,
WeeksSinceLastEvent = t.WeekNumber - MAX(CASE WHEN t.Event = 1 THEN t.WeekNumber END)
OVER(PARTITION BY t.PersonID ORDER BY t.WeekNumber)
FROM dbo.T AS t;
The key parts are:
CASE WHEN t.Event = 1 THEN t.WeekNumber END
Only consider week number where it is a valid event. SinceMAX
with ignore nulls this will only consider relevant rowsOVER (PARTITION BY t.PersonID ORDER BY t.WeekNumber)
- Only consider rows for the current person, where the weeknumber is lower than the current row.