I'm trying to select the last value in a column that isn't blank (not non-null technically) and select it for every date after, until that value changes, then select that value and so on.
What I have:
company_id | date | sales_stage | previous_sales_stage |
---|---|---|---|
1 | 2022-05-20 00:00:00.000 | a | NULL |
1 | 2022-05-19 00:00:00.000 | b | NULL |
1 | 2022-05-18 00:00:00.000 | c | NULL |
1 | 2022-05-17 00:00:00.000 | c | NULL |
1 | 2022-05-16 00:00:00.000 | c | NULL |
1 | 2022-05-15 00:00:00.000 | d | NULL |
1 | 2022-05-14 00:00:00.000 | d | NULL |
1 | 2022-05-13 00:00:00.000 | d | NULL |
1 | 2022-05-12 00:00:00.000 | e | NULL |
1 | 2022-05-11 00:00:00.000 | e | NULL |
What I'd like to have:
company_id | date | sales_stage | previous_sales_stage |
---|---|---|---|
1 | 2022-05-20 00:00:00.000 | a | b |
1 | 2022-05-19 00:00:00.000 | b | c |
1 | 2022-05-18 00:00:00.000 | c | d |
1 | 2022-05-17 00:00:00.000 | c | d |
1 | 2022-05-16 00:00:00.000 | c | d |
1 | 2022-05-15 00:00:00.000 | d | e |
1 | 2022-05-14 00:00:00.000 | d | e |
1 | 2022-05-13 00:00:00.000 | d | e |
1 | 2022-05-12 00:00:00.000 | e | NULL |
1 | 2022-05-11 00:00:00.000 | e | NULL |
This is going in a summary table that is shared with all companies ( so there will be multiple company ids and stages for a given date) and is calculated daily from a stored proc. If there is no value, NULL
is okay.
Here is some T-SQL to create a temp table that recreates this example :
DROP TABLE IF EXISTS #blah
CREATE TABLE #blah
(
company_id INT
, [date] DATETIME
, sales_stage VARCHAR(50)
, previous_sales_stage VARCHAR(50)
);
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'a',CAST(GETDATE() AS DATE))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'b',dateadd(d,-1,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'c',dateadd(d,-2,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'c',dateadd(d,-3,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'c',dateadd(d,-4,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'d',dateadd(d,-5,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'d',dateadd(d,-6,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'d',dateadd(d,-7,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'e',dateadd(d,-8,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'e',dateadd(d,-9,cast(getdate() as date)))
SELECT * FROM #blah
UPDATE #blah SET previous_sales_stage = 'b' WHERE company_id = 1 AND date = '2022-05-20 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'c' WHERE company_id = 1 AND date = '2022-05-19 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'd' WHERE company_id = 1 AND date = '2022-05-18 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'd' WHERE company_id = 1 AND date = '2022-05-17 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'd' WHERE company_id = 1 AND date = '2022-05-16 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'e' WHERE company_id = 1 AND date = '2022-05-15 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'e' WHERE company_id = 1 AND date = '2022-05-14 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'e' WHERE company_id = 1 AND date = '2022-05-13 00:00:00.000'
UPDATE #blah SET previous_sales_stage = NULL WHERE company_id = 1 AND date = '2022-05-12 00:00:00.000'
UPDATE #blah SET previous_sales_stage = NULL WHERE company_id = 1 AND date = '2022-05-11 00:00:00.000'
SELECT * FROM #blah
CodePudding user response:
You can use a subquery or an OUTER APPLY, like this:
SELECT t1.company_id, t1.date, t1.sales_stage, x.previous_sales_stage
FROM #blah t1
OUTER APPLY (
SELECT TOP 1 t2.sales_stage AS previous_sales_stage
FROM #blah t2
WHERE t2.company_id=t1.company_id
AND t2.date<t1.date
AND t2.sales_stage<>t1.sales_stage
ORDER BY t2.date DESC
) x